pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	cca7b38564	Don't allow skipping deepcopy (#102973 ) We might mutate it afterwards! This could lead to hard to understand bugs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102973 Approved by: https://github.com/albanD	2023-06-05 20:01:16 +00:00
Vinay Kumar Burugu	8215468870	Feature:To add --tolerance option to benchmark scripts (#102218 ) The "tolerance" option evaluates the model on the baseline device in eager mode (default: CPU) compared to the test device (e.g., CUDA, XLA, etc.) and compares the output tensors to determine the absolute tolerance value based on the [formula](https://pytorch.org/docs/stable/generated/torch.allclose.html). It then saves the results in a CSV file. This comparison highlights the tolerance/accuracy difference between XLA and GPU/CPU devices and can also be used to evaluate newer accelerators. This feature aims to identify accuracy failures on the test device (e.g., XLA) and facilitate quick bug triaging. This feature enables the following capabilities: 1. Ability to monitor accuracy issues of backends 2. Provide more informative picture on accuracy beyond pass/ fail status 3. Having a dump of accuracy information will help triage models accordingly The data generated using this feature is in the [spreadsheet](https://docs.google.com/spreadsheets/d/1A8BAzSqfAw0Q5rgzK5Gk__Uy7qhuynh8tedxKnH-t94/edit#gid=0). The spreadsheet data can be used to compile the below summary table: \| Suite \| Max Tolerance \| \| No. of models with high inaccuracy(>=0.005) \| \| Mean Tolerance \| \| \|------------------ \|:-------------:\|:--------:\|:-------------------------------------------:\|:--------:\|:--------------:\|:--------:\| \| \| xla \| inductor \| xla \| inductor \| xla \| inductor \| \| huggingface \| 0.1169 \| 0.0032 \| 1 \| 0 \| 0.0022 \| 0.0005 \| \| timm_models \| 0.0373 \| 2.8892 \| 10 \| 8 \| 0.0028 \| 0.7044 \| \| torchbench \| 3.013 \| 3.0381 \| 6 \| 2 \| 0.0016 \| 0.0016 \| \| All models \| 3.013 \| 3.0381 \| 17 \| 10 \| 0.0028 \| 0.7044 \| I used PyTorch release/2.0 branch and corresponding [commit_pin](https://github.com/pytorch/pytorch/blob/release/2.0/.github/ci_commit_pins/xla.txt) for XLA to generate the above data. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102218 Approved by: https://github.com/jansel	2023-06-03 06:40:26 +00:00
Edward Z. Yang	624257890e	Reenable hf_T5_generate (#102818 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102818 Approved by: https://github.com/albanD	2023-06-02 17:59:53 +00:00
Edward Z. Yang	7c00d45312	Reenable cm3leon_generate (#102793 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102793 Approved by: https://github.com/albanD, https://github.com/awgu	2023-06-02 15:15:26 +00:00
Animesh Jain	65631d4515	[benchmarks] Use train mode for accuracy checks for HF models (#102578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102578 Approved by: https://github.com/desertfire	2023-05-31 19:47:18 +00:00
Bin Bao	47b884a74c	[inductor] Revert a CI remedy for Triton compilation error (#102541 ) Summary: revert https://github.com/pytorch/pytorch/pull/91634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102541 Approved by: https://github.com/ngimel	2023-05-31 13:13:51 +00:00
Animesh Jain	33a49eeae7	[benchmark] Flag to switch on activation checkpointing for HF models (#102557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557 Approved by: https://github.com/ngimel, https://github.com/Chillee	2023-05-30 23:46:14 +00:00
Horace He	e71ab21422	update triton pin (#101919 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101919 Approved by: https://github.com/ngimel	2023-05-30 17:16:05 +00:00
Animesh Jain	040d2cc969	[dynamo] Some torchrec_dlrm related fixes (#101953 ) Issue 1 of https://github.com/pytorch/pytorch/issues/101918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101953 Approved by: https://github.com/jansel	2023-05-28 17:56:08 +00:00
Bin Bao	ee33bae5c7	Fix an issue where checking sameness throw an exception (#102279 ) Summary: currently the exception is caught by outside and marked as infra_error Pull Request resolved: https://github.com/pytorch/pytorch/pull/102279 Approved by: https://github.com/anijain2305	2023-05-25 19:49:23 +00:00
Jason Ansel	5ba16011d7	Suppress profiler spam in dynamo benchmarks (#101942 ) Makes this stuff go away: ``` STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:311] Completed Stage: Warm Up STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:317] Completed Stage: Collection STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:321] Completed Stage: Post Processing ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101942 Approved by: https://github.com/shunting314, https://github.com/desertfire	2023-05-22 18:32:31 +00:00
Edward Z. Yang	22ca1a1124	Partially fix shape mismatch in vision_maskrcnn (#101477 ) The bulk of the heavy lifting is happening in https://github.com/pytorch/vision/pull/7592 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101477 Approved by: https://github.com/voznesenskym	2023-05-21 05:20:08 +00:00
drisspg	6f13d6892a	Add meta support for multinomial (#101324 ) # Summary Found this when trying to compile the text gen loop of nanogpt here: `b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324 Approved by: https://github.com/ngimel	2023-05-19 00:04:26 +00:00
Animesh Jain	794cc3952e	adding moco to CI (#101098 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/101098 Approved by: https://github.com/desertfire	2023-05-18 10:01:49 +00:00
chuanqiw	b315c9b5ab	[CI] Enlarge memory for OOM models in inductor cpu HF accuracy test (#101395 ) Change the Inductor CPU HF accuracy test node from `linux.4xlarge` (32GB) to `linux.24xlarge` (192GB) to enlarge the node memory. Also add 3 HF models back to CI test. Fixes #101390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101395 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/huydhn	2023-05-18 09:23:30 +00:00
Jason Ansel	403ce1a1c9	Fix benchmark model names printouts with tqdm (#101627 ) With the TQDM changes in #100969 -- the models names ended up getting hidden from the benchmark printouts. We would print the model name with no newline, then tqdm would print a `\r` and overwrite the name of the running model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101627 Approved by: https://github.com/ezyang	2023-05-17 15:31:11 +00:00
PaliC	e0fc24cdc5	add retries to inductor benchmark suite (#101019 ) This pr accomplishes 1) Enables retries for downloading torchbenchmark and huggingface models in a similar method to how we do it for timm models right now. 2) creates a `_download_model` function for the hugging face and TIMM runners whose output I plan to use to preload the models somewhere if possible (please double check I'll be saving the right thing). Instead of retries, we plan to just add torchbench to a docker image as it is relatively small. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 3361a4c</samp> > _We're the brave and bold coders of the `common.py` module_ > _We've made a handy function for downloading models_ > _We've shared it with our mates in the other runners_ > _So pull and push and try again, we'll get them all in time_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101019 Approved by: https://github.com/huydhn, https://github.com/desertfire	2023-05-16 21:41:50 +00:00
Edward Z. Yang	41468833fb	vision_maskrcnn is now deterministic (#101116 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101116 Approved by: https://github.com/ngimel	2023-05-16 21:32:17 +00:00
Yanbo Liang	e4eaf33346	Re-enable detectron2_maskrcnn on CI (#100791 ) #99665 has been fixed, we can re-enable these models on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100791 Approved by: https://github.com/huydhn	2023-05-16 04:25:58 +00:00
Edward Z. Yang	f48718f749	Update torchbench pin (#101365 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101365 Approved by: https://github.com/albanD, https://github.com/awgu	2023-05-15 16:52:31 +00:00
Natalia Gimelshein	49578913fb	update timm commit (#100931 ) Fixes #100903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100931 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-05-12 04:22:08 +00:00
Edward Z. Yang	41a4e22015	Update torchbench pin (#101071 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101071 Approved by: https://github.com/malfet	2023-05-11 18:09:40 +00:00
Jason Ansel	036a8d6b4a	Remove NullContext() from benchmark runners (#100309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100309 Approved by: https://github.com/Skylion007, https://github.com/anijain2305	2023-05-11 06:42:27 +00:00
XiaobingSuper	c84627c2ee	benchmarks: make --amp works for cpu path (#101057 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101057 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2023-05-11 02:51:38 +00:00
Edward Z. Yang	c658732950	[RFC] Add tqdm to benchmarking script (#100969 ) Here's what it looks like, on a slower running benchmark: https://github.com/pytorch/pytorch/assets/13564/47c4a5bd-e963-45de-a15c-2fd943de0fa4 There's actually quite a bit of dead time, it's possible there are more spots we should add tqdm to. Looking for opinions on utility of this. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100969 Approved by: https://github.com/Skylion007	2023-05-10 15:39:24 +00:00
Bin Bao	76cc3ab4f3	[CI] Delete skips from https://github.com/pytorch/pytorch/issues/93847 (#96049 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96049 Approved by: https://github.com/jansel	2023-05-10 01:27:27 +00:00
Edward Z. Yang	9eab13fc90	Reenable llama benchmark (#100877 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100877 Approved by: https://github.com/albanD	2023-05-09 01:12:54 +00:00
Natalia Gimelshein	9790f9174a	skip lcnet (#100726 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/100726 Approved by: https://github.com/voznesenskym	2023-05-05 23:19:42 +00:00
Animesh Jain	3f025c607c	summarize graph breaks (#100696 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696 Approved by: https://github.com/yanboliang	2023-05-05 22:27:47 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00
Yanbo Liang	896eb1db26	[Dynamo] Skip TB Background_Matting model eager accuracy check because of non deterministic (#100513 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100513 Approved by: https://github.com/anijain2305	2023-05-03 07:06:50 +00:00
Jason Ansel	fdc853b14c	Add --baseline option to benchmark runners (#100266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100266 Approved by: https://github.com/ngimel	2023-05-02 02:35:11 +00:00
Edward Z. Yang	e918fd18e7	Disable densenet121 as it is flaky (#100371 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100371 Approved by: https://github.com/voznesenskym	2023-05-02 01:49:11 +00:00
Edward Z. Yang	5d93265cce	Report timeout/infra_error instead of 0.0000 on infra error (#100372 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100372 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-05-01 14:56:01 +00:00
Huy Do	9a69634b28	Skip some failing dynamic shape models on periodic (#99895 ) After some recent changes, these tests are failing in periodic trunk. So let's move them to unstable while waiting for the team to root cause the issue https://github.com/pytorch/pytorch/issues/99893. Note that a forward fix can use `ciflow/unstable` to run those unstable jobs to confirm that they are fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99895 Approved by: https://github.com/malfet	2023-04-25 07:05:08 +00:00
Edward Z. Yang	04e8df4dd7	Return full accuracy status for printing, not abbreviated version (#99894 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99894 Approved by: https://github.com/jansel	2023-04-25 05:17:10 +00:00
Edward Z. Yang	cd61707167	yolov3 dynamic training accuracy is fixed (#99896 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99896 Approved by: https://github.com/albanD	2023-04-25 01:15:24 +00:00
chuanqiw	e9e5ffe83e	Re-enable dynamic shapes test in dynamo benchmark (#99816 ) Set `torch._dynamo.config.assume_static_by_default = False` for dynamic shapes flag enabled Fixes #99815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99816 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-04-24 20:34:52 +00:00
Edward Z. Yang	f602b3a6ae	Preserve mark_dynamic when cloning inputs (#99617 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617 Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305	2023-04-22 19:46:31 +00:00
Bin Bao	e09f785a72	[CI] Remove inductor skip list for Huggingface (#99375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99375 Approved by: https://github.com/anijain2305	2023-04-21 18:13:22 +00:00
Edward Z. Yang	fc8fa6c356	Require at least one tensor to be marked dynamic with --dynamic-batch-only (#99620 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99620 Approved by: https://github.com/voznesenskym	2023-04-21 00:17:08 +00:00
Huy Do	5315317b7b	Skip some detectron2_maskrcnn models with KeyError _ignore_torch_cuda_oom (#99599 ) These tests are failing in trunk `233cc34d3b` with `KeyError: '_ignore_torch_cuda_oom'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99599 Approved by: https://github.com/malfet	2023-04-20 18:11:35 +00:00
Jason Ansel	3233450d07	Add TorchXLA option to benchmark runner (#99505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99505 Approved by: https://github.com/voznesenskym	2023-04-19 22:44:52 +00:00
Will Constable	9ac2b041c9	Make opacus xfail instead of skip (#99380 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380 Approved by: https://github.com/desertfire, https://github.com/anijain2305	2023-04-19 21:09:06 +00:00
Michael Voznesensky	113bd11cf4	Skip levit (#99491 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99491 Approved by: https://github.com/ezyang	2023-04-19 07:41:42 +00:00
Edward Z. Yang	039faf0dbf	Add invariant that all symbolic shapes must be bound in graph (#99089 ) Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards. With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well. This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089 Approved by: https://github.com/voznesenskym	2023-04-16 01:48:19 +00:00
Yanbo Liang	15fe5a0798	[Dynamo] Fix benchmark --verbose error (#99224 ) Dynamo benchmark --verbose is broken: ``` Traceback (most recent call last): File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 400, in <module> torchbench_main() File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 396, in torchbench_main main(TorchBenchmarkRunner(), original_dir) File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 1967, in main return maybe_fresh_cache( File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 993, in inner return fn(args, *kwargs) File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 2135, in run torch._dynamo.config.log_level = logging.DEBUG File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/config_utils.py", line 67, in __setattr__ raise AttributeError(f"{self.__name__}.{name} does not exist") AttributeError: torch._dynamo.config.log_level does not exist ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99224 Approved by: https://github.com/voznesenskym	2023-04-15 20:18:50 +00:00
Bin Bao	34f681c13b	[CI] Remove inductor skip list for timm_models (#98840 ) Summary: check against the expected csv file instead of skipping tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840 Approved by: https://github.com/ezyang	2023-04-15 13:54:41 +00:00
Bin Bao	e5501a967e	[inductor] Support IndexPutFallback in cpp_wrapper (#98972 ) Summary: 1) Make the fallback index_put generate the right cpp code in cpp_wapper 2) Add a --cpp-wrapper option to common.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/98972 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-04-13 15:41:03 +00:00
Edward Z. Yang	b8b840be3d	Convert logging f-strings to use % format, part five (#98765 ) This does some annoying but simple cases by hand. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765 Approved by: https://github.com/wanchaol	2023-04-11 13:17:59 +00:00
Edward Z. Yang	b09722f540	Convert logging f-strings to use % format, part two (#98700 ) This hits multi-line logging strings Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	bdb79a8f52	Turn off divisible_by_16 for dynamic shapes; support ablation (#98471 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98471 Approved by: https://github.com/ngimel, https://github.com/voznesenskym	2023-04-06 12:57:07 +00:00
Edward Z. Yang	cf1bfca2ba	Require batch dimensions to be compiled dynamically (#98334 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98334 Approved by: https://github.com/voznesenskym	2023-04-05 19:40:22 +00:00
Edward Z. Yang	b923f84805	Switch accuracy CI to dynamic batch only (#98307 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98307 Approved by: https://github.com/wconstab	2023-04-05 01:20:12 +00:00
Elias Ellison	a3365e1d0d	Increment pending forwards after invocation (#98101 ) Forwards are only pending following invocation, not before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98101 Approved by: https://github.com/ngimel	2023-04-05 00:04:39 +00:00
Bin Bao	69ff39d2e7	Skip gat, gcn and sage for TorchBench CUDA test (#98244 ) Summary: The three models only support CPU for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244 Approved by: https://github.com/ezyang	2023-04-04 01:06:18 +00:00
Jason Ansel	55afaa46a4	Support functools.partial and itertools.product (#98120 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98120 Approved by: https://github.com/anijain2305	2023-04-03 18:23:25 +00:00
Bin Bao	ba7ee00f00	Add a --inference flag to dynamo benchmark script (#98173 ) Summary: When calling benchmark scripts, make it a requirement to pass --inference or --training Pull Request resolved: https://github.com/pytorch/pytorch/pull/98173 Approved by: https://github.com/huydhn	2023-04-03 17:12:28 +00:00
Jason Ansel	92b46202ef	Add --stats option to benchmark scripts (#98109 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98109 Approved by: https://github.com/anijain2305	2023-04-02 02:23:13 +00:00
Edward Z. Yang	5df59f957f	Fix G001,G002,G003 in logs to % syntax (#97812 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812 Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos	2023-04-01 01:43:33 +00:00
Bin Bao	c699ac17df	[CI] Bump up torchbench version to fix dynamo graph breaks in transformers (#98003 ) Summary: When we bump up the torchbench version pin last time, we found there were new graph breaks introduced with the trasformers version upgrade, see https://github.com/pytorch/pytorch/pull/96782. Turns out they are already fixed upstream, see https://github.com/huggingface/transformers/pull/21648 and https://github.com/pytorch/benchmark/pull/1511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98003 Approved by: https://github.com/ngimel	2023-03-31 16:52:09 +00:00
Edward Z. Yang	97fc8ea5f4	Run the benchmark suite with dynamic batch only (#97912 ) Symbolic shapes compile time on full CI with inductor is horribly long (even though our aot_eager local runs seemed to suggest that the added latency was only 10s per model.) To patch over the problem for now, run the benchmark suite with dynamic batch only. This should absolve a lot of sins. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97912 Approved by: https://github.com/janeyx99, https://github.com/desertfire	2023-03-30 18:04:48 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
William Wen	b93e1f377e	[dynamo, benchmarks] Add inductor-mode (for max-autotune) and warm start options to dynamo benchmarks (#97719 ) Title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97719 Approved by: https://github.com/shunting314	2023-03-29 21:09:00 +00:00
Edward Z. Yang	f754be897a	Disable speedup_experiment_ds (#97806 ) It seems to be broken. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97806 Approved by: https://github.com/jansel	2023-03-29 01:27:31 +00:00
Bin Bao	a9a81ab7e3	[CI] Run benchmark test with dynamo_eager in periodic (#97543 ) Summary: The idea is to catch any dynamo_eager regression earlier, and also we can take that off the dashboard run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97543 Approved by: https://github.com/huydhn	2023-03-28 01:02:49 +00:00
Shunting Zhang	652592efa9	[inductor] use torch.prifiler in the triton wrapper (#97405 ) I think it's helpful to use torch.profiler to profile the triton wrapper. E.g., I tried it for nvidia_deeprecommender's infernece graph. Even with max-autotune, we see the majority of the time the GPU is running 2 mm/addmm op. That's why max autotune does not help for this model since tuning does not affect the external mm ops. <img width="711" alt="Screenshot 2023-03-22 at 5 49 28 PM" src="https://user-images.githubusercontent.com/52589240/227072474-2f0d7205-4a10-4929-b1b7-551214788c61.png"> next step I'll check why the triton mm kernels are not picked. EDIT: the above screenshot is captured without max-autotune due to a typo. below is the trace with max-autotune enabled: <img width="712" alt="Screenshot 2023-03-22 at 6 43 26 PM" src="https://user-images.githubusercontent.com/52589240/227077624-fdccf928-be08-4211-871b-a9e3d7b76fbe.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97405 Approved by: https://github.com/ngimel	2023-03-27 21:54:25 +00:00
Edward Z. Yang	cff4826f28	pytorch_unet is now passing (#97309 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97309 Approved by: https://github.com/janeyx99, https://github.com/zou3519	2023-03-22 13:55:05 +00:00
Bin Bao	be49d3b170	[CI] Turn on debug logging for dla102 and gernet_l (#97307 ) Summary: Log the generated code for those two flaky tests to see if there is any codegen difference when they fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97307 Approved by: https://github.com/ezyang	2023-03-22 13:42:13 +00:00
Natalia Gimelshein	e7d9331688	[inductor] hoist symbolic padding expressions (#97099 ) Towards fixing pnasnet5large, see #96709. The generated kernel looks much better ``` @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: 'fp32', 1: 'fp32', 2: 'i32', 3: 'i32', 4: 'i32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': [], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 6), equal_to_1=())]}) @triton.jit def triton_(in_ptr0, out_ptr0, ks0, ks1, ks2, ks3, xnumel, XBLOCK : tl.constexpr): xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x1 = (xindex // ks0) % ks0 x0 = xindex % ks0 x2 = (xindex // ks3) x4 = xindex tmp0 = x1 + ((-1)ks1) tmp1 = 0 tmp2 = tmp0 >= tmp1 tmp3 = ks2 tmp4 = tmp0 < tmp3 tmp5 = x0 + ((-1)ks1) tmp6 = tmp5 >= tmp1 tmp7 = tmp5 < tmp3 tmp8 = tmp2 & tmp4 tmp9 = tmp8 & tmp6 tmp10 = tmp9 & tmp7 tmp11 = tl.load(in_ptr0 + (x0 + ((-1)ks1) + (ks2x1) + (x2(ks2ks2)) + ((-1)ks1ks2) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) tmp12 = tl.where(tmp10, tmp11, 0.0) tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) ``` Interestingly, removing `expand` in in index `simplify` function makes `load` expression a little bit better, but `store` fails to simplify to flat store in this case, so I'm leaving `expand` in. Full pnasnet still chokes on `ceiling` in batch_norm kernels, additionally, it looks like shape propagation goofs in inductor and generates overly complicated expressions, we should switch to meta data from fx graph. I'm still not adding `ceil` print to triton, because we should be able to hoist all indexing expression (and just printing ceil without converting to int64 doesn't work) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97099 Approved by: https://github.com/jansel	2023-03-21 21:43:32 +00:00
Edward Z. Yang	e74c5e5637	rexnet_100 is disabled for static, does not need dynamic listing (#97100 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97100 Approved by: https://github.com/Skylion007	2023-03-19 20:57:49 +00:00
Bin Bao	577d930c39	[CI] Revert https://github.com/pytorch/pytorch/pull/96195 (#96897 ) Summary: https://github.com/pytorch/pytorch/pull/96195 was an experiment for debugging flaky failures on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96897 Approved by: https://github.com/ngimel	2023-03-16 06:28:18 +00:00
Edward Z. Yang	3606f59366	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-16 02:54:18 +00:00
Will Constable	54cd4a67d0	Output peak memory stats from dynamo torchbench perf CI (#95666 ) Adds absolute memory usage numbers (in addition to compression ratio) to performance jobs. Example output: <img width="1211" alt="image" src="https://user-images.githubusercontent.com/4984825/225419950-500908c5-00ce-4711-afa2-c995bf90d35d.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95666 Approved by: https://github.com/ezyang, https://github.com/williamwen42	2023-03-15 19:24:47 +00:00
Bin Bao	33c7be360f	[reland][CI] switch torchbench to a pinned version (#96782 ) Summary: This is reland of https://github.com/pytorch/pytorch/pull/96553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96782 Approved by: https://github.com/huydhn	2023-03-15 12:46:36 +00:00
Edward Z. Yang	037acd5a22	Update CI skips (#96745 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96745 Approved by: https://github.com/wconstab	2023-03-14 22:19:10 +00:00
PyTorch MergeBot	be4eaa69c2	Revert "[CI] switch torchbench to a pinned version (#96553 )" This reverts commit `61d6ccd29a`. Reverted https://github.com/pytorch/pytorch/pull/96553 on behalf of https://github.com/desertfire due to land race	2023-03-14 21:39:45 +00:00
PyTorch MergeBot	ba4fb9b6ad	Revert "Default specialize_int to False (#96624 )" This reverts commit `1ac8782db2`. Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py	2023-03-14 19:43:47 +00:00
Bin Bao	61d6ccd29a	[CI] switch torchbench to a pinned version (#96553 ) Summary: Previously we were using a branch on torchbench which skips torchaudio. We should switch to make sure a good test coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96553 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-03-14 18:42:22 +00:00
Edward Z. Yang	1ac8782db2	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-14 18:37:47 +00:00
David Berard	6e3d51b08a	[inductor][CI] also skip rexnet_100 on non-dynamic shapes (#96691 ) Recent failures show rexnet_100 accuracy is flaky also on non-dynamic shapes (was already disabled for dynamic shapes in #96474). The failure occurs for the same reason (stem.bn.weight.grad). e.g. https://github.com/pytorch/pytorch/actions/runs/4402868441/jobs/7710977874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96691 Approved by: https://github.com/desertfire	2023-03-14 18:11:59 +00:00
Edward Z. Yang	ff7e510d1e	Correctly use PythonPrinter for generating wrapper code referencing sympy (#96710 ) Otherwise you get stuff like ceiling(s0) which is not valid Python code. Fixes volo_d1_224 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96710 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-03-14 14:35:52 +00:00
Wang, Eikan	3cad8d23d0	[Inductor] Skip the hf_T5_base due to intermittent failure on CI (#96649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96649 Approved by: https://github.com/desertfire	2023-03-14 07:40:20 +00:00
Edward Z. Yang	507feb805f	Don't specialize torch.Size with specialize_int = False (#96419 ) Fixes https://github.com/pytorch/pytorch/issues/95868 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96419 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-03-14 01:32:58 +00:00
Edward Z. Yang	c7f39c0820	Update CI skips (#96554 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96554 Approved by: https://github.com/janeyx99	2023-03-13 13:40:45 +00:00
David Berard	29cd60dfb7	[CI] handle more dynamo benchmark models that are not expected to be deterministic (#96324 ) Follow-up to #96245. alexnet, Background_Matting, vision_maskrcnn, and vgg16 all have the same problem; but on float32 they were also failing on the previous day so I missed this. Once the amp jobs became available I could see that these have the same issue (on both float32 and amp). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96324 Approved by: https://github.com/desertfire	2023-03-10 18:15:34 +00:00
Bin Bao	a651e6253a	[CI] Change compile_threads to 1 when running benchmark accuracy test on CI (#96195 ) Summary: This is not a pretty solution, but it a way to verify if the flakiness is coming from parallel compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96195 Approved by: https://github.com/ngimel	2023-03-10 17:39:38 +00:00
Edward Z. Yang	ff2e14f200	Skip rexnet_100 in dynamic CI (#96474 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96474 Approved by: https://github.com/yanboliang, https://github.com/msaroufim	2023-03-10 01:23:19 +00:00
Edward Z. Yang	c988de1040	[EASY] Update inductor training dynamic skips (#96298 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96298 Approved by: https://github.com/Chillee, https://github.com/janeyx99	2023-03-08 19:31:46 +00:00
Bin Bao	b3a079810e	[CI] Add a workflow for quick perf comparison (#96166 ) Summary: ciflow/inductor-perf-test-nightly now contains full dashboard run which takes a very long time. Ed proposed a simplification of the perf run there, but it is still worth to have a set of fast perf test which only includes one configuration (--training --amp). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96166 Approved by: https://github.com/huydhn, https://github.com/weiwangmeta	2023-03-08 19:09:04 +00:00
Bin Bao	664381b293	[CI] Avoid calling torch.use_deterministic_algorithms for some models (#96245 ) tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/96245 Approved by: https://github.com/davidberard98	2023-03-08 03:35:32 +00:00
Edward Z. Yang	d0641ed247	[TEST] Turn on unspecialize int dynamic training inductor CI (#96058 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96058 Approved by: https://github.com/janeyx99, https://github.com/voznesenskym	2023-03-07 16:08:45 +00:00
Edward Z. Yang	a6e3e7905e	Turn on unspecialize int dynamic inductor CI (#96034 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96034 Approved by: https://github.com/voznesenskym	2023-03-07 12:39:55 +00:00
Jason Ansel	95d17dc93d	[inductor] Reland #95567 part 1 (#96023 ) This is the non-problematic part of #95567. The errors were coming from IR printing changes which will be next in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96023 Approved by: https://github.com/ngimel, https://github.com/mlazos	2023-03-06 22:57:22 +00:00
Edward Z. Yang	1fd7ea1ba8	Update skips for RecursionError (#96109 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96109 Approved by: https://github.com/huydhn	2023-03-06 17:55:38 +00:00
Bin Bao	60cf95610d	[CI] Skip xcit_large_24_p8_224 in TIMM (#96048 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96048 Approved by: https://github.com/jansel	2023-03-05 14:54:46 +00:00
Bin Bao	1359d16fe8	[CI] Further tighten the checking of two eager runs (#95902 ) Summary: To catch nondeterminism in eager if there is any. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95902 Approved by: https://github.com/jansel	2023-03-05 14:53:02 +00:00
Edward Z. Yang	c7c4a20321	Update dynamic skips (#95966 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95966 Approved by: https://github.com/janeyx99, https://github.com/voznesenskym	2023-03-04 23:01:58 +00:00
Jason Ansel	43dd043ea7	Revert "[inductor] Improve error messages (#95567 )" (#96014 ) This reverts commit `62b775583f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96014 Approved by: https://github.com/Chillee	2023-03-04 04:03:31 +00:00
Edward Z. Yang	d303665d33	Make int unspecialization actually work (#95621 ) OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor. The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors. * I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.) * Only 0/1 integers get specialized by default now * A hodgepodge of fixes. I'll comment on the PR about them. Fixes https://github.com/pytorch/pytorch/issues/95469 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621 Approved by: https://github.com/jansel, https://github.com/Chillee	2023-03-04 01:22:08 +00:00
Jason Ansel	62b775583f	[inductor] Improve error messages (#95567 ) Example error message before/after (710 to 131 lines): https://gist.github.com/jansel/6fecad057738089fa95bf08c3de9fc8a Pull Request resolved: https://github.com/pytorch/pytorch/pull/95567 Approved by: https://github.com/mlazos	2023-03-02 02:20:55 +00:00
Bin Bao	879f0c3fee	[CI] Increate the timeout limit for benchmark test (#95787 ) Summary: xcit_large_24_p8_224 occasionally hits TIMEOUT on CI. Bump up the limit to reduce flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95787 Approved by: https://github.com/ezyang, https://github.com/ZainRizvi	2023-03-01 19:54:25 +00:00
Bin Bao	e79b2b7792	[CI] Force clear triton cache between running each test (#95729 ) Summary: The idea is to see if this reduces some of the flakiness we have seen on CI. If it does help, then we have a problem in our caching implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95729 Approved by: https://github.com/ngimel	2023-03-01 04:10:03 +00:00
Will Constable	1a72712645	Add dynamo graph break stats to CI (#95635 ) Adds columns to csv produced by accuracy job including dynamo graph break stats. Example output from torchbench CI job: <img width="771" alt="image" src="https://user-images.githubusercontent.com/4984825/221716236-9276684e-1be8-43e1-837e-f41671d4e0e3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95635 Approved by: https://github.com/ezyang	2023-02-28 16:17:46 +00:00
Edward Z. Yang	3762e801ba	Update dynamic skips (#95587 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95587 Approved by: https://github.com/voznesenskym	2023-02-28 03:26:55 +00:00
Bin Bao	fa5a4b0dfc	[CI] Do not compare two eager run results against fp64 result (#95616 ) Summary: When running the benchmark test with --accuracy, two eager runs should return the same result. If not, we want to detect it early, but comparing against fp64_output may hide the non-deterministism in eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95616 Approved by: https://github.com/ZainRizvi	2023-02-27 20:11:21 +00:00
Bin Bao	ab1ab3ab19	[CI] Specify more torch.backends.cudnn options to reduce non-determinism (#95478 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95478 Approved by: https://github.com/ezyang	2023-02-25 18:54:12 +00:00
Bin Bao	4c8ad93a7c	[Inductor][CI] Remove hf_GPT2_large from CPU inference test (#95473 ) Summary: hf_GPT2_large shows random failure on CI for the CPU inference. Created https://github.com/pytorch/pytorch/issues/95474 for the Intel team to investigate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95473 Approved by: https://github.com/anijain2305	2023-02-24 18:21:36 +00:00
Will Constable	8de4238a31	Add dynamo bench arg --per_process_memory_fraction (#95260 ) Simply pipes the arg to the existing torch.cuda API by the same name. Useful for locally debugging OOMs that happened on a smaller GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95260 Approved by: https://github.com/davidberard98	2023-02-22 05:11:18 +00:00
Edward Z. Yang	08370ddad8	Update model skips (#95089 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95089 Approved by: https://github.com/albanD	2023-02-20 13:24:49 +00:00
Wang, Eikan	954c767bc6	[Inductor] Enable accuracy test for CPPBackend (#94898 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94898 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-02-20 05:02:15 +00:00
Edward Z. Yang	a2f44d82f8	Flag guard unbacked SymInt/SymFloat support (#94987 ) I believe this fixes the AllenaiLongformerBase problem in periodic. The longer version of the problem is here is we are currently optimistically converting all item() calls into unbacked SymInt/SymFloat, but sometimes this results in a downstream error due to a data-dependent guard. Fallbacks for this case are non-existent; this will just crash the model. This is bad. So we flag guard until we get working fallbacks. What could these fallbacks look like? One idea I have is to optimistically make data-dependent calls unbacked, but then if it results in a crash, restart Dynamo analysis with the plan of graph breaking when the item() call immediately happened. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94987 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-02-17 00:25:05 +00:00
Edward Z. Yang	7aaebe00ee	Fail dynamic_aot_eager AllenaiLongformerBase model (#94986 ) ``` GuardOnDataDependentSymNode: It appears that you're trying to get a value out of symbolic int/float whose value is data-dependent (and thus we do not know the true value.) The expression we were trying to evaluate is Eq(i3, -1). Scroll up to see where each of these data-dependent accesses originally occurred. While executing %as_strided : [#users=1] = call_method[target=as_strided](args = (%pad,), kwargs = {size: (12, %add, 768, 64), stride: (%getitem, %mul, %getitem_1, %getitem_2)}) Original traceback: File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py", line 928, in <graph break in _sliding_chunks_matmul_attn_probs_value> chunked_value = padded_value.as_strided(size=chunked_value_size, stride=chunked_value_stride) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94986 Approved by: https://github.com/albanD	2023-02-16 20:02:46 +00:00
Aaron Gokaslan	0444a6c90a	[BE] Remove deprecated logging warn method (#94708 ) Swaps all logging.warn calls to logging.warning since the former is deprecated and even raises a deprecation warning now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94708 Approved by: https://github.com/ezyang	2023-02-13 18:24:52 +00:00
Edward Z. Yang	ae7a628b03	Dynamic shapes CI updates (#94690 ) Data from https://github.com/pytorch/pytorch/pull/94683 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94690 Approved by: https://github.com/cpuhrsch	2023-02-13 18:20:12 +00:00
PyTorch MergeBot	10c430ba0a	Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 )" This reverts commit `2a5851735a`. Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/desertfire due to TIMM models start to show flaky failures after this PR, need more investigation	2023-02-10 04:40:32 +00:00
Bin Bao	2a5851735a	Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 ) Summary: It looks like setting torch.backends.cudnn.deterministic to True is not enough for eliminating non-determinism when testing benchmarks with --accuracy, so let's turn off cudnn completely. With this change, mobilenet_v3_large does not show random failure on my local environment. Also take this chance to clean up CI skip lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363 Approved by: https://github.com/ezyang	2023-02-09 23:43:13 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Edward Z. Yang	c028fc4e25	Decouple PT2 dynamic shapes from the functorch setting (#94469 ) The functorch setting still exists, but now it is no longer necessary: we infer use of Python dispatcher by checking if the ambient FakeTensorMode has a ShapeEnv or not. The setting still exists, but it is for controlling direct AOTAutograd use now; for PT2, it's sufficient to use torch._dynamo.config.dynamic_shapes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469 Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel	2023-02-09 06:41:41 +00:00
PyTorch MergeBot	ca63040d2b	Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 )" This reverts commit `7bfc59993d`. Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/huydhn due to This change fails in trunk `7bfc59993d` running out of memory. Mark this as weird because it was green in PR	2023-02-09 01:24:35 +00:00
Bin Bao	7bfc59993d	Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 ) Summary: It looks like setting torch.backends.cudnn.deterministic to True is not enough for eliminating non-determinism when testing benchmarks with --accuracy, so let's turn off cudnn completely. With this change, mobilenet_v3_large does not show random failure on my local environment. Also take this chance to clean up CI skip lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363 Approved by: https://github.com/ezyang	2023-02-08 23:30:10 +00:00
Jason Ansel	eb1aca162e	Re-enable cudagraphs for benchmark scripts (#94192 ) Related to https://github.com/pytorch/pytorch/pull/93253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94192 Approved by: https://github.com/albanD, https://github.com/desertfire	2023-02-08 16:38:32 +00:00
chuanqiw	94394e568e	change the dynamo benchmark timeout as a parameter (#94284 ) Change the dynamo benchmark timeout from hard code to a parameter with default value 1200ms, cause the hard code 1200ms timeout led some single thread mode model crashed on CPU platform. With the parameter, users can specify the timeout freely. Fixes #94281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94284 Approved by: https://github.com/malfet	2023-02-08 00:45:08 +00:00
Bin Bao	db011e11ea	Skip sebotnet33ts_256 on CI (#94067 ) Summary: Random failure on CI and it happens more frequently lately. Skip for now and filed an issue at https://github.com/pytorch/pytorch/issues/94066 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94067 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-02-06 14:58:54 +00:00
Edward Z. Yang	1d53123f44	Report graph breaks separately from graph count (#94143 ) graph break != graph count - 1. Suppose you have a nested inline function call f1 to f2 to f3. A graph break in f3 results in six graphs: f1 before, f2 before, f3 before, f3 after, f2 after, f1 after. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94143 Approved by: https://github.com/voznesenskym	2023-02-05 04:03:12 +00:00
Edward Z. Yang	c1da35af5e	Update dynamic benchmark skips (#94114 ) Data from https://github.com/pytorch/pytorch/pull/94134 Signed-off-by: Edward Z. Yang <ezyangmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94114 Approved by: https://github.com/SherlockNoMad	2023-02-04 20:36:51 +00:00
Jason Ansel	e071d72f3c	Tag dynamo backends as debug/experimental (#93878 ) Hides debug/experimental backends by default. Before: ``` torch._dynamo.list_backends() ['aot_eager', 'aot_eager_decomp_partition', 'aot_torchxla_trace_once', 'aot_torchxla_trivial', 'aot_ts', 'aot_ts_nvfuser', 'cudagraphs', 'dynamo_accuracy_minifier_backend', 'dynamo_minifier_backend', 'eager', 'inductor', 'ipex', 'nvprims_aten', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'torchxla_trace_once', 'torchxla_trivial', 'ts', 'tvm'] ``` After: ``` torch._dynamo.list_backends() ['aot_ts_nvfuser', 'cudagraphs', 'inductor', 'ipex', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'tvm'] ``` Fixes https://github.com/pytorch/pytorch/issues/93733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93878 Approved by: https://github.com/voznesenskym	2023-02-04 00:50:51 +00:00
Jason Ansel	0a93e6db5a	Fix/refactor dynamo ipex backend (#93863 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93863 Approved by: https://github.com/desertfire	2023-02-03 21:42:27 +00:00
Jason Ansel	203b2cad3e	Remove fx2trt/torch2trt backends (#93822 ) These backends have been broken for some time. I tried to get them running again, but as far as I can tell they are not maintained. Installing torch_tensorrt downgrades PyTorch to 1.12. If I manually bypass that downgrade, I get import errors from inside fx2trt. Fixes that re-add these are welcome, but it might make sense to move these wrappers to the torch_tensorrt repo once PyTorch 2.0 support is added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93822 Approved by: https://github.com/frank-wei	2023-02-03 21:04:21 +00:00
Jason Ansel	a5ff40032d	Fix/refactor dynamo onnxrt backend (#93818 ) Fixes https://github.com/pytorch/pytorch/issues/90352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93818 Approved by: https://github.com/voznesenskym	2023-02-03 20:48:02 +00:00
Edward Z. Yang	2481fc0df4	Add count to FakeTensorMode.__torch_dispatch__ (#93936 ) Most calls to fake tensor never hit `FakeTensor.__torch_dispatch__` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93936 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-02-03 14:21:11 +00:00
Fabio Rocha	63115b70f0	Fixed issue with --diff-branch arg in dynamo benchmarks (#93989 ) As @peterbell10 pointed out, it was giving incorrect results for `compression_ratio` and `compression_latency` when you used `--diff-branch`. This fixes this by running a separate subprocess for each branch to make sure you are not being affected by run for other branch. Also added a couple of more significant figures to numbers in summary table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93989 Approved by: https://github.com/jansel	2023-02-03 08:36:57 +00:00
Jason Ansel	60e8c766b5	Refactor dynamo training backends (#93409 ) This splits training.py into many files and moves them from `dynamo.optimizations.training` to `dynamo.backends.*`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93409 Approved by: https://github.com/ezyang	2023-02-03 03:07:15 +00:00
atalman	6e285c479d	Remove cuda 11.6 from CI replace with 11.7 (#93406 ) Remove cuda 11.6 from CI replace with 11.7 Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406 Approved by: https://github.com/malfet, https://github.com/desertfire	2023-02-02 19:16:05 +00:00
Jason Ansel	d7b39b17ab	Remove torch/_dynamo/optimizations/{analysis,log_args}.py (#93279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93279 Approved by: https://github.com/voznesenskym	2023-02-02 02:34:36 +00:00
Edward Z. Yang	03b465a6d0	Add --iterations to benchmark script (#93858 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93858 Approved by: https://github.com/williamwen42	2023-02-01 21:56:49 +00:00
Edward Z. Yang	08041c5264	Configurable repro_tolerance for same_two_models (#93398 ) Fixes https://github.com/pytorch/pytorch/issues/93293 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93398 Approved by: https://github.com/SherlockNoMad	2023-02-01 01:41:48 +00:00
Edward Z. Yang	811e95a15e	--dynamic-ci-skips now works for all backends (#93369 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93369 Approved by: https://github.com/albanD	2023-01-31 20:07:58 +00:00
Edward Z. Yang	efee879695	Don't suppress warnings in CI. (#93269 ) Warnings are an important clue that something bad is going on. You want to see them in logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93269 Approved by: https://github.com/voznesenskym	2023-01-30 19:21:09 +00:00
Edward Z. Yang	9eb402d18e	Update dynamic benchmark skips (#93228 ) Data from https://github.com/pytorch/pytorch/pull/93223 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93228 Approved by: https://github.com/desertfire	2023-01-30 14:22:53 +00:00
XiaobingSuper	9a2becf60a	inductor: fix inplace op's wrong lowering issue when preop is NopKernel (#92247 ) For TIMM ghostnet_100, there has such case, concat+inplace_add: ``` import torch from torch._inductor import config config.debug = True torch._dynamo.config.verbose=True class MockModule(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x, y, z): out = torch.cat([x, y], dim=1) out+=z return out mod = MockModule().eval() inputs = ( torch.randn([1, 64, 16, 16]), torch.randn([1, 64, 16, 16]), torch.randn([1, 128, 16, 16]), ) ref = mod(inputs) with torch.no_grad(): opt_model = torch._dynamo.optimize('inductor')(mod) out = opt_model(inputs) out = opt_model(inputs) out = opt_model(inputs) print(torch.equal(ref, out)) ``` the inductor always get a wrong result, I find that inductor get a wrong code: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile from torch._inductor.select_algorithm import extern_kernels aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, const float* __restrict__ in_ptr1, const float* __restrict__ in_ptr2, const float* __restrict__ in_ptr3, float* __restrict__ out_ptr0, float* __restrict__ out_ptr1, float* __restrict__ out_ptr2) { { for(long i0=0; i0<1024; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 16i0); tmp0.store(out_ptr0 + 16i0); } #pragma omp simd simdlen(8) for(long i0=16384; i0<16384; i0+=1) { auto tmp0 = in_ptr0[i0]; out_ptr0[i0] = tmp0; } } { for(long i0=0; i0<1024; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 16i0); tmp0.store(out_ptr1 + 16i0); } #pragma omp simd simdlen(8) for(long i0=16384; i0<16384; i0+=1) { auto tmp0 = in_ptr1[i0]; out_ptr1[i0] = tmp0; } } { for(long i0=0; i0<2048; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 16i0); auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr3 + 16i0); auto tmp2 = tmp0 + tmp1; tmp2.store(out_ptr2 + 16i0); } #pragma omp simd simdlen(8) for(long i0=32768; i0<32768; i0+=1) { auto tmp0 = in_ptr2[i0]; auto tmp1 = in_ptr3[i0]; auto tmp2 = tmp0 + tmp1; out_ptr2[i0] = tmp2; } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1, arg2_1 = args args.clear() buf3 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) buf0 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1)) # alias buf1 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1), 16384) # alias buf2 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr())) del arg0_1 del arg1_1 del arg2_1 return (buf3, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32) arg1_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32) arg2_1 = rand_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) ``` you can see that the add operation always adds a random value, see the ir code: 1. ir_pre_fusion.txt* ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf3'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf3'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` 2. ir_post_fusion.txt ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf3'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf3'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` From the ir code, you can see the buf3 always adds an empty buf2 which has never been written. The root cause is that there has a potential issue when doing the mutation for inplace add when its' input is a NopKernel. After this PR, the ir will be like(ir_pre_fusion.txt): ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf2'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf2'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,)), StarDep(name='buf2')] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) buf3.mutations = ['buf2'] class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92247 Approved by: https://github.com/ngimel, https://github.com/desertfire, https://github.com/jansel	2023-01-29 05:35:21 +00:00
Edward Z. Yang	025ef99ddf	Get rid of dedicated inductor dynamic_shapes config (#93076 ) Instead, use Dynamo dynamic_shapes config Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93076 Approved by: https://github.com/voznesenskym	2023-01-27 02:58:16 +00:00
Edward Z. Yang	5e9fa0a8fc	Mark crossvit_9_240 as passing dynamic=True (#92981 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92981 Approved by: https://github.com/Chillee	2023-01-26 13:05:37 +00:00
Michael Voznesensky	d322f82b05	Add @count util to torch, use it to track benchmark stats (#93013 ) <img width="1333" alt="image" src="https://user-images.githubusercontent.com/4755252/214687911-f766f072-c162-4298-9aed-c889f1375336.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93013 Approved by: https://github.com/ezyang	2023-01-26 03:09:12 +00:00
Edward Z. Yang	2ee94633a1	Change ciflow/inductor to test inductor inference with dynamic shapes (#92771 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92771 Approved by: https://github.com/voznesenskym	2023-01-25 02:21:02 +00:00
Edward Z. Yang	f724ecbd52	Add dynamic shapes aot_eager to periodic (#92770 ) This means it overlaps with ciflow/inductor, but I'm about to change that soon. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92770 Approved by: https://github.com/voznesenskym, https://github.com/albanD, https://github.com/desertfire	2023-01-25 02:21:02 +00:00
Edward Z. Yang	fb46d3e138	Run all of the timm models shards in the periodic (#92900 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92900 Approved by: https://github.com/bdhirsh, https://github.com/atalman	2023-01-24 17:56:20 +00:00
Horace He	c0327eb463	Some more inductor fixes for symbolic shapes (#92867 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92867 Approved by: https://github.com/ezyang	2023-01-24 15:05:46 +00:00
PyTorch MergeBot	2cf03bbbab	Revert "Run all of the timm models shards in the periodic (#92743 )" This reverts commit `de69cedf98`. Reverted https://github.com/pytorch/pytorch/pull/92743 on behalf of https://github.com/atalman due to This needs to be landed after https://github.com/pytorch/pytorch/pull/92845 and https://github.com/pytorch/pytorch/pull/92846 are landed	2023-01-23 23:44:09 +00:00
Fabio Rocha	a43b55e135	A few usability improvements for the dynamo benchmarks. (#92713 ) --diff_main renamed to --diff-branch BRANCH and now works again Summary table splits results per branch. csv output now has column with branch name when run in this mode Added --progress flag so you can track how many models are going to be run. Example output: ``` $ python benchmarks/dynamo/torchbench.py --quiet --performance --backend inductor --float16 --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --filter 'alexnet\|vgg16' --progress --diff viable/strict Running model 1/2 batch size: 1024 cuda eval alexnet dynamo_bench_diff_branch 1.251x p=0.00 cuda eval alexnet viable/strict 1.251x p=0.00 Running model 2/2 batch size: 128 cuda eval vgg16 dynamo_bench_diff_branch 1.344x p=0.00 cuda eval vgg16 viable/strict 1.342x p=0.00 Summary for tag=dynamo_bench_diff_branch: speedup gmean=1.30x mean=1.30x abs_latency gmean=24.09x mean=25.26x compilation_latency mean=2.0 seconds compression_ratio mean=0.9x Summary for tag=viable/strict: speedup gmean=1.30x mean=1.30x abs_latency gmean=24.11x mean=25.29x compilation_latency mean=0.5 seconds compression_ratio mean=1.0x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92713 Approved by: https://github.com/jansel	2023-01-23 18:23:35 +00:00
Edward Z. Yang	4a3fb7bcbc	Make CI_SKIPS into a consolidated dict (#92769 ) This makes it easier to add more configurations without causing a thicket of if statements selecting the correct variable. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92769 Approved by: https://github.com/voznesenskym, https://github.com/desertfire	2023-01-23 14:57:18 +00:00
Edward Z. Yang	3cfd2fa1c7	Make --inductor imply --backend inductor (#92764 ) This is to make some downstream code more uniform (can always ask args.backend for backend) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92764 Approved by: https://github.com/voznesenskym, https://github.com/desertfire	2023-01-23 14:57:18 +00:00
Edward Z. Yang	c52567ec18	Switch CI exclusions to use exact match. (#92761 ) Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI." Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761 Approved by: https://github.com/jansel	2023-01-22 17:10:20 +00:00
Edward Z. Yang	de69cedf98	Run all of the timm models shards in the periodic (#92743 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92743 Approved by: https://github.com/kit1980	2023-01-21 18:39:17 +00:00
Michael Voznesensky	5778c04a15	Add `--timing` flag, phase timing to @dynamo_timed (#92637 ) Ex output: ``` TIMING: entire_frame_compile:8.574629999999999 backend_compile:5.26806 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637 Approved by: https://github.com/ezyang	2023-01-21 10:52:13 +00:00
Edward Z. Yang	27bf879b8c	Forward fix: restore sebotnet33ts_256 aot_eager skip (#92741 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92741 Approved by: https://github.com/kit1980	2023-01-21 08:10:23 +00:00
Edward Z. Yang	9ad0aca6e5	Update aot_eager CI failures (#92696 ) Based on https://hud.pytorch.org/pr/92689 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92696 Approved by: https://github.com/desertfire	2023-01-21 02:29:22 +00:00
PyTorch MergeBot	44132cc4b0	Revert "Add `--timing` flag, phase timing to @dynamo_timed (#92637 )" This reverts commit `773b513435`. Reverted https://github.com/pytorch/pytorch/pull/92637 on behalf of https://github.com/malfet due to Broke lint	2023-01-20 16:23:20 +00:00
Michael Voznesensky	773b513435	Add `--timing` flag, phase timing to @dynamo_timed (#92637 ) Ex output: ``` TIMING: entire_frame_compile:8.574629999999999 backend_compile:5.26806 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637 Approved by: https://github.com/ezyang	2023-01-20 05:01:21 +00:00
Edward Z. Yang	44e52ea514	Reenable mobilevit_s in CI, seems to pass (#92585 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92585 Approved by: https://github.com/Chillee	2023-01-19 15:24:45 +00:00
Edward Z. Yang	b92a7afed9	Reclassify some dynamic aot_eager failures as static failures (#92376 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92376 Approved by: https://github.com/Chillee	2023-01-18 19:27:11 +00:00
Wu, Chunyuan	3aa6cec18c	[dynamo] exclude reset_rng_state when measure timing (#92237 ) Fixes inductor performance regression on CPU: https://github.com/pytorch/torchdynamo/issues/2027, https://github.com/pytorch/torchdynamo/issues/2028 and https://github.com/pytorch/torchdynamo/issues/2029. The details are explained here: https://github.com/pytorch/torchdynamo/issues/2028#issuecomment-1381496678. ### Performance - Model: lennard_jones - Machine: IceLake (32 cores per socket) - Configuration: single instance, 32 cores per instance - jemalloc and iomp enabled ```bash python benchmarks/dynamo/torchbench.py --inductor-settings --inductor --performance --float32 -dcpu -n5000 --no-skip --dashboard --only=lennard_jones --quiet ``` <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> Time before regression \| Time after regression \| Time with this PR -- \| -- \| -- 0.00020483799744397402 \| 0.0002818034990923479 \| 0.00020241099991835654 </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92237 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-18 13:17:28 +00:00
Edward Z. Yang	fbbb19599a	Update dynamic skips after #92076 (#92103 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92103 Approved by: https://github.com/voznesenskym, https://github.com/Chillee	2023-01-13 04:05:10 +00:00
Edward Z. Yang	74cbf058a5	Support --dynamic-ci-skips (#91893 ) This makes it easier for us to run only the skipped benchmarks and see if that actually started passing. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91893 Approved by: https://github.com/albanD	2023-01-11 20:02:58 +00:00
Edward Z. Yang	d24324bf1d	s/INDCUTOR/INDUCTOR/ (#91885 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91885 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet	2023-01-11 12:28:19 +00:00
Edward Z. Yang	56ed976edf	hrnet_w18, tts_angular works with dynamic shapes (#91891 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91891 Approved by: https://github.com/voznesenskym	2023-01-11 11:40:16 +00:00
blzheng	0c1777acec	Dynamo benchmark: add CPU specific changes (#88477 ) This pr adds some CPU specific changes: - Add support for IPEX backend - https://github.com/pytorch/torchdynamo/issues/1618 - https://github.com/pytorch/torchdynamo/issues/1534 - Enable CPU launcher in runner.py. - Fix the issue that some environment variables are not support on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-07 09:26:06 +00:00
Shunting Zhang	a5f32f8978	training support for dynamo+torchxla integration (#88449 ) We've already shown some promising perf result by integrating dynamo with torchxla for inference. To provide consistent UX for training and for inference, in this PR we try to enable training for dynamo/torchxla. Training is trickier than inference and we may not expect much perf gains since 1. in training case, torchxla only generate a single combined graph for fwd/bwd/optimizer while in `torchxla_trace_once` bridge we added in dynamo, due to how AOT_Autograd works, we will generate 3 graphs: one for forward, one for backward and one for the optimizer. XLA favors larger graph to do more optimizations. 2. in training case, tracing overhead can be overlapped with computation. Tracing overhead is not as a big deal for training as for inference. After all training cares more about throughput while inference cares more about latency. 3. in training case, people can increase batch size to 'mitigate' the tracing overhead. Increase batch size does not change tracing overhead, thus it shows like the tracing overhead 'per example' reduces. But we still want to add training support to dynamo/torchxla to make the work complete. We added '--iterations-per-run' argument to control how may iterations we do per measure/device sync. This is to understand the impact of item 2 above. Results: With '--iterations-per-run' equals to 1, here are the perf numbers: ``` +-------------------------+--------------------+-------------------------+ \| Model \| XLA (trace once) \| XLA (trace everytime) \| +=========================+====================+=========================+ \| resnet18 \| 0.91 \| 0.959 \| +-------------------------+--------------------+-------------------------+ \| resnet50 \| 0.917 \| 0.932 \| +-------------------------+--------------------+-------------------------+ \| resnext50_32x4d \| 0.912 \| 0.905 \| +-------------------------+--------------------+-------------------------+ \| alexnet \| 1.038 \| 0.974 \| +-------------------------+--------------------+-------------------------+ \| mobilenet_v2 \| 0.881 \| 0.835 \| +-------------------------+--------------------+-------------------------+ \| mnasnet1_0 \| 0.903 \| 0.931 \| +-------------------------+--------------------+-------------------------+ \| vgg16 \| 0.914 \| 0.967 \| +-------------------------+--------------------+-------------------------+ \| BERT_pytorch \| 1.359 \| 0.84 \| +-------------------------+--------------------+-------------------------+ \| timm_vision_transformer \| 1.288 \| 0.893 \| +-------------------------+--------------------+-------------------------+ \| geomean \| 1.0006 \| 0.913794 \| +-------------------------+--------------------+-------------------------+ ``` Overall it looks like graph break indeed cause perf loss. But for BERT_pytorch and timm_vision_transformer we still see perf gain. We need do more experiments with larger '--iterations-per-run' NOTE: In torchbench.py I added the following code to do a few workaround: ``` from myscripts import workaround # TODO will remove this line before landing ``` Here are the content of workaround.py: ``` import torch from torch import nn import os # override max_pool2d with avg_pool2d if os.environ.get("REPLACE_MAXPOOL", "0") == "1": torch.nn.MaxPool2d = torch.nn.AvgPool2d ``` It work around a few issues we found 1. MaxPool2d does not work for training in dynamo/torchxla: https://github.com/pytorch/torchdynamo/issues/1837 . WIP fix from Brian in https://github.com/pytorch/pytorch/pull/90226 , https://github.com/pytorch/xla/pull/4276/files (WIP) 2. recent change ( this PR https://github.com/pytorch/pytorch/pull/88697 ) in op decomposition cause batch_norm ops to fallback in torchxla. Fix from jack in https://github.com/pytorch/xla/pull/4282#event-7969608134 . (confirmed the fix after adding Deduper to handle duplicated return from fx graph generated by AOTAutograd) 3. we have issue to handle dropout because of random seed out of sync issue. Here is the fix: https://github.com/pytorch/xla/pull/4293 (confirmed the fix) Example command: ``` REPLACE_MAXPOOL=1 USE_FAKE_TENSOR=0 GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=aot_torchxla_trace_once --only vgg16 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88449 Approved by: https://github.com/wconstab, https://github.com/qihqi, https://github.com/malfet	2023-01-05 19:59:34 +00:00
Bin Bao	6bf0e3b697	[inductor] Check for BackendCompilerFailed on CI (#91634 ) Summary: https://github.com/pytorch/pytorch/pull/91283/ skips certain random triton failure on CI, but we need to check against the BackendCompilerFailed exception type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91634 Approved by: https://github.com/ngimel	2023-01-03 22:38:29 +00:00
Animesh Jain	a32916190d	buck-related minifier work (#91215 ) Summary: Extending the minifier to generate buck target Test Plan: N/A Differential Revision: D42173893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91215 Approved by: https://github.com/bertmaher, https://github.com/ngimel	2022-12-22 19:33:50 +00:00
Bin Bao	07c61685c8	[inductor] CI improvments (#91283 ) Summary: 1) Setting torch.backends.cudnn.deterministic to True helps to eliminate the eager_variance failures seen on CI 2) Skip Triton failure instead of retry 3) Some minor script cleanup is also included in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91283 Approved by: https://github.com/anijain2305	2022-12-22 15:37:43 +00:00
Michael Lazos	2f5759eaba	Disable non-deterministic models for optimizers (#91149 ) These two models are non-deterministic even with constant inputs + weights and sometimes fail with variations between the fp64 and fp32 models in CI very rarely as a result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91149 Approved by: https://github.com/desertfire	2022-12-20 20:19:54 +00:00
Bin Bao	84e73e1269	[inductor] small CI improvements (#91140 ) Summary: 1) Increase timm_model download retry times; 2) Skip certain random triton failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91140 Approved by: https://github.com/williamwen42	2022-12-20 17:26:12 +00:00
Michael Lazos	07c340bb2a	Remove debug code (#91148 ) Removes some debug code Pull Request resolved: https://github.com/pytorch/pytorch/pull/91148 Approved by: https://github.com/desertfire, https://github.com/williamwen42	2022-12-20 15:00:55 +00:00
Bin Bao	2a37ba8e81	[inductor] Add retry after benchmark test fails on CI (#90808 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90808 Approved by: https://github.com/malfet	2022-12-19 18:10:55 +00:00
Michael Lazos	1accd915a4	Re-enable optimizers (#90709 ) Fixes https://github.com/pytorch/pytorch/issues/90165 https://github.com/pytorch/torchdynamo/issues/328 Re-enables optimizer capture + compilation now that the dynamo slowdowns have been fixed and it has speedups, numbers to come soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/90709 Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/yanboliang	2022-12-19 04:07:41 +00:00
Edward Z. Yang	212873c615	Add dynamic shapes benchmark accuracy to CI (#90444 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444 Approved by: https://github.com/voznesenskym	2022-12-17 11:17:20 +00:00
PyTorch MergeBot	e2377c8300	Revert "Add dynamic shapes benchmark accuracy to CI (#90444 )" This reverts commit `85db031e60`. Reverted https://github.com/pytorch/pytorch/pull/90444 on behalf of https://github.com/ezyang due to lint failing	2022-12-17 07:18:07 +00:00
Edward Z. Yang	85db031e60	Add dynamic shapes benchmark accuracy to CI (#90444 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444 Approved by: https://github.com/voznesenskym	2022-12-17 06:39:45 +00:00
Michael Lazos	7c524221ba	[reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 ) …king (#87492)" (#90746)" This reverts commit `ff1bbc2773`. This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956 Approved by: https://github.com/desertfire	2022-12-17 06:27:15 +00:00
PyTorch MergeBot	6bc6fb21db	Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 )" This reverts commit `8bc38ae4e2`. Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures	2022-12-16 19:28:05 +00:00
Michael Lazos	8bc38ae4e2	[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 ) …king (#87492)" (#90746)" This reverts commit `ff1bbc2773`. This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956 Approved by: https://github.com/desertfire	2022-12-16 13:33:38 +00:00
Bin Bao	ad4189c8db	[reland][inductor] Update TIMM skip list (#90762 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90762 Approved by: https://github.com/eellison	2022-12-13 19:56:23 +00:00
Bin Bao	ff1bbc2773	Revert "[reland][dynamo] use optimizers correctly in benchmarking (#87492 )" (#90746 ) This reverts commit `d91d7a3221`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746 Approved by: https://github.com/anijain2305	2022-12-13 11:37:16 +00:00
PyTorch MergeBot	e37c8c8436	Revert "[inductor] Update TIMM skip list (#90188 )" This reverts commit `fd3f5d7bf7`. Reverted https://github.com/pytorch/pytorch/pull/90188 on behalf of https://github.com/desertfire due to flaky accuracy failure	2022-12-12 15:31:50 +00:00
Edward Z. Yang	e1ed5ad5a5	Add a timeout to benchmark script (#90634 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90634 Approved by: https://github.com/voznesenskym	2022-12-11 23:12:29 +00:00
Jiong Gong	181d37475d	Simple fix: add missing positional arg in init_optimizer() call (#90641 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/90641 Approved by: https://github.com/kit1980	2022-12-11 13:18:05 +00:00
Bin Bao	fd3f5d7bf7	[inductor] Update TIMM skip list (#90188 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90188 Approved by: https://github.com/anijain2305	2022-12-09 21:30:23 +00:00
Animesh Jain	d91d7a3221	[reland][dynamo] use optimizers correctly in benchmarking (#87492 ) Reland https://github.com/pytorch/pytorch/pull/87311 mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492 Approved by: https://github.com/desertfire	2022-12-09 20:32:53 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
David Berard	8f079b895b	[Dynamo+FSDP] Update benchmarks with use_orig_params=True (#90100 ) After https://github.com/pytorch/pytorch/pull/89523, we now need to assert use_orig_params=True, even in the non-recursive case where (I think) we wouldn't otherwise need to run with use_orig_params=True. Tested with `python benchmarks/dynamo/torchbench.py --training --accuracy --only hf_T5 --fsdp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90100 Approved by: https://github.com/wconstab	2022-12-07 03:33:58 +00:00
Richard Zou	4068c5467d	[Reland] Move functorch/_src to torch/_functorch (#88756 ) (#90091 ) This will be the last disruptive functorch internals change. Why are we moving these files? - As a part of rationalizing functorch we are moving the code in functorch/_src to torch/_functorch - This is so that we can offer the functorch APIs as native PyTorch APIs (coming soon) and resolve some internal build issues. Why are we moving all of these files at once? - It's better to break developers all at once rather than many times Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091 Approved by: https://github.com/anijain2305, https://github.com/ezyang	2022-12-03 14:17:15 +00:00
Wang, Eikan	0bde810572	Add more debug information for Inductor (#90008 ) - Add graph index to the profile information of the Inductor kernel for better debugability. The generated code for different graphs could produce kernels with the same name. The side effect is that it is hard to identify the portion of E2E performance for these kernels because the profiler will aggregate the performance with the same kernel name regardless of different graphs. Hence, this PR added the graph index to the profile information to address this limitation. - Label arbitrary code ranges for `eager` and `opt` modes for better debugability The profile information of dynamo benchmarks mixes the eager mode and opt mode. It is hard to separate the range for different modes. This PR added eager and opt marks to the profile information to address this limitation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90008 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-12-02 09:34:48 +00:00
Animesh Jain	3162a48a77	[dynamo][benchmarks] Call zero grad (#90026 ) Hoping that it might reduce some flakiness Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026 Approved by: https://github.com/williamwen42	2022-12-02 04:05:57 +00:00
Animesh Jain	68805b08d1	[benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780 ) Moving to train mode for TIMM models and also raising batch size for accuracy testing. Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780 Approved by: https://github.com/ngimel	2022-11-30 12:57:35 +00:00
PyTorch MergeBot	218d9c6e09	Revert "Move functorch/_src to torch/_functorch (#88756 )" This reverts commit `52bc5c1cfe`. Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests `52bc5c1cfe` https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace	2022-11-29 17:17:11 +00:00
Richard Zou	52bc5c1cfe	Move functorch/_src to torch/_functorch (#88756 ) This will be the last disruptive functorch internals change. Why are we moving these files? - As a part of rationalizing functorch we are moving the code in functorch/_src to torch/_functorch - This is so that we can offer the functorch APIs as native PyTorch APIs (coming soon) and resolve some internal build issues. Why are we moving all of these files at once? - It's better to break developers all at once rather than many times Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756 Approved by: https://github.com/ezyang	2022-11-29 13:55:42 +00:00
Bin Bao	465ee7bc09	[inductor] skip dm_nfnet_f0 in TIMM model test (#89768 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89768 Approved by: https://github.com/clee2000	2022-11-28 20:08:41 +00:00
Animesh Jain	cdf4087597	[benchmarks] Disabling gradscaler (#89741 ) Disabling Gradscaler because 1) Benchmark setup runs 2 iterations of fwd-bwd. So, not useful. 2) Current setup shares grad_scaler for eager and dynamo model, which is bad as Gradscaler has state and can adjust the scaling factor between eager and dynamo run, making accuracy check harder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89741 Approved by: https://github.com/ngimel	2022-11-28 20:08:37 +00:00

... 2 3 4 5 6 ...

386 Commits