pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
eellison	605236af06	Force fp16 for vision_maskrcnn inference (#113110 ) For fp16 for maskrcnn inference (doesnt support bf16). Also skip phi_1_5 in training - it OOMs even with batch size 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113110 Approved by: https://github.com/xmfan	2023-11-10 02:25:11 +00:00
Elias Ellison	f6fb9fd681	use smaller batch size for timm_efficientdet in inference (#113095 ) Previously had OOMs Pull Request resolved: https://github.com/pytorch/pytorch/pull/113095 Approved by: https://github.com/xmfan ghstack dependencies: #112650	2023-11-07 07:08:16 +00:00
Elias Ellison	5c1ea30ca3	bump torchbench commit (#112650 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112650 Approved by: https://github.com/msaroufim, https://github.com/xuzhao9	2023-11-07 03:56:16 +00:00
Simon Fan	28ebe5df7a	yolov3: reduce batch size due to OOM (#111959 ) yolov3 w/ cudagraphs (known to use more memory) is failing perf test due to OOM (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Mon,%2016%20Oct%202023%2020:19:47%20GMT&stopTime=Mon,%2023%20Oct%202023%2020:19:47%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=main&lCommit=0b424ee0b7bfe09e0a438a63e8336e95eea85901&rBranch=main&rCommit=29048be41ca3aa8974795d93b9ea9fd6dee415fc) I'm reducing the batch size from 16 to 8 to keep the same batch size for all yolov3 HUD benchmarks Pull Request resolved: https://github.com/pytorch/pytorch/pull/111959 Approved by: https://github.com/xuzhao9	2023-10-25 06:18:53 +00:00
Simon Fan	88ef126a93	rename nanogpt_generate to nanogpt to also support train (#109746 ) Differential Revision: [D49522940](https://our.internmc.facebook.com/intern/diff/D49522940) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109746 Approved by: https://github.com/msaroufim, https://github.com/malfet, https://github.com/xuzhao9	2023-09-29 17:36:48 +00:00
angelayi	a565f1bee6	[aotinductor] Skip benchmarks with control flow (#109661 ) Since AOTInductor doesn't support control flow yet, we will skip over tests that are currently failing due to containing control flow in the code. Logs taken from https://hud.pytorch.org/benchmark/compilers?startTime=Tue%2C%2012%20Sep%202023%2022%3A56%3A40%20GMT&stopTime=Tue%2C%2019%20Sep%202023%2022%3A56%3A40%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=main&lCommit=2c1554a0323107d821be3ff13df7833b9f0b960d&rBranch=main&rCommit=47be61e12bd51df27182343d312dc3df485d5559 Errors documented in https://github.com/pytorch/pytorch/issues/105217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109661 Approved by: https://github.com/desertfire	2023-09-25 18:49:06 +00:00
Mark Saroufim	e2cfbca5ab	Add clip to dynamo runners (#109840 ) CLIP was moved to canary models because we use the multimodal version which depends on torchtext which torchbench deprecated https://github.com/pytorch/benchmark/pull/1837 This issue didn't show up before because we hadn't updated the torchbench pin Pull Request resolved: https://github.com/pytorch/pytorch/pull/109840 Approved by: https://github.com/cpuhrsch	2023-09-22 20:50:57 +00:00
eellison	d24ba7a634	Add 3d Attn Pattern to match HF Whisper (#109156 ) Adds a 3d pattern that improves perf of HF Whisper from 1.3 -> 4.1. We could be matching more generally on 3d, but i'll leave that for another pr. Thanks to @drisspg for helping me write the pattern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109156 Approved by: https://github.com/yanboliang ghstack dependencies: #109663, #108894, #108917, #109142	2023-09-20 16:39:31 +00:00
Simon Fan	54c5f474a7	Forward rank and world size info to Torchbench models when using dynamo runner (#108438 ) Adding support to pass rank and world_size to torchbench model, via its extra_args parameter: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L83C80-L83C90 This is used for models which distribute over multiple GPUs e.g. simple_gpt https://github.com/pytorch/benchmark/pull/1867 Also add an option to skip multiprocess only gpu models Testing via `python benchmarks/dynamo/torchbench.py -d cuda --output=benchmark_logs/performance.csv --inference --performance --timing --print-memory --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108438 Approved by: https://github.com/Chillee	2023-09-14 21:01:20 +00:00
drisspg	ad90ab31f2	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-13 13:59:05 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit `24e9bbe22a`. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
eellison	738106c1f7	Torchbench model tolerance changes (#108598 ) Move detectron2_fcos_r_50_fpn to amp. The minifier showed the following snippet as causing the divergence, where inductor has better numerics than eager: ``` import torch def foo(x): return x > .2 inp = torch.tensor([.2002], device="cuda", dtype=torch.bfloat16) print(foo(inp)) print(torch.compile(foo)(inp)) ``` doctr_reco_predictor had very minimal divergence (.002 vs .001 required), bumping tolerance here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108598 Approved by: https://github.com/shunting314	2023-09-06 16:52:29 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
Elias Ellison	e18f512b81	Update accuracy checking for nan, floats (#108202 ) Fixes inference accuracy for `doctr_reco_predictor` and `pyhpc_turbulent_kinetic_energy`. For the `same(float, float)` comparison we weren't going through the more rigorous tensor comparison path which takes into account the fp64 base results. Also return True when fp64 base result are not well formed (nan). I debugged these models and the source of divergence were innocuous: `doctr_reco_predictor` - can be fixed by turning off layout optimization, decomp for batch norm `pyhpc_turbulent_kinetic_energy` - divergence caused because fused kernel keeps precision in fp32 instead of casting back and forth from/to fp32/bf16. Fused kernel is better precision, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108202 Approved by: https://github.com/jansel	2023-09-01 02:54:01 +00:00
Elias Ellison	63eee52ba7	Add Drq to BF16 Higher Tolernace (#108368 ) This passes for me on aws gpu but not devgpu, and was already in the `REQUIRE_HIGHER_FP16_TOLERANCE` set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108368 Approved by: https://github.com/shunting314	2023-09-01 00:29:27 +00:00
Shunting Zhang	eb8659fe81	pass inference accuracy check for detectron2_fcos_r_50_fpn (#108328 ) We need a higher tolerance to pass the inference accuracy check for detectron2_fcos_r_50_fpn . Command: ``` python benchmarks/dynamo/torchbench.py --backend inductor --bfloat16 --accuracy --only detectron2_fcos_r_50_fpn --disable-cudagraphs --inference ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108328 Approved by: https://github.com/jansel	2023-08-31 20:21:20 +00:00
Edward Z. Yang	5b04e9b6ce	Install torchrec/fbgemm from source in CI (#106808 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106808 Approved by: https://github.com/malfet, https://github.com/xuzhao9	2023-08-12 02:08:44 +00:00
Mark Saroufim	1b32ac3cab	Update torchbench.txt (#106761 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/106761 Approved by: https://github.com/malfet	2023-08-09 19:01:21 +00:00
Edward Z. Yang	c379d6283a	Don't suppress ModuleNotFoundError if the failure is for an unrelated module (#106807 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106807 Approved by: https://github.com/williamwen42, https://github.com/voznesenskym	2023-08-09 01:54:49 +00:00
Mark Saroufim	90c264c276	sd flaky on cpu skip (#106726 ) waiting for update expected script Pull Request resolved: https://github.com/pytorch/pytorch/pull/106726 Approved by: https://github.com/malfet	2023-08-08 02:44:47 +00:00
Elias Ellison	578969ca61	skip maml (#106471 ) This one benchmark distorts benchmarks because it is so low (.0007, the equivalent of a 1400x speedup). It also has been flakey, which has produced a lot of noise. Disabling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106471 Approved by: https://github.com/anijain2305	2023-08-04 22:14:09 +00:00
Howard Huang	236eda4d51	remove jit from torchbench (#106071 ) Need to remove jit arguments after changes in https://github.com/pytorch/benchmark/pull/1787 Also curious, is there is a procedure for updating torchbench version in Pytorch CI? Pull Request resolved: https://github.com/pytorch/pytorch/pull/106071 Approved by: https://github.com/xuzhao9, https://github.com/msaroufim, https://github.com/malfet, https://github.com/lezcano	2023-08-03 21:04:43 +00:00
Mark Saroufim	6268ab2c2d	torchbench pin upd: hf auth token, clip, whisper, llamav2, sd (#106009 ) Includes stable diffusion, whisper, llama7b and clip To get this to work I had to Pass in hf auth token to all ci jobs, github does not pass in secrets from parent to child automatically. There's a likelihood HF will rate limit us in case please revert this PR and I'll work on adding a cache next - cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @aakhundov @malfet Something upstream changed in torchbench too where now `hf_Bert` and `hf_Bert_large` are both failing on some dynamic shape looking error which I'm not sure how to debug yet so for now felt a bit gross but added a skip since others are building on top this work @ezyang `llamav2_7b_16h` cannot pass through accuracy checks cause it OOMs on deepcloning extra inputs this seems to make it not need to show up in expected numbers csv, will figure this when we update the pin with https://github.com/pytorch/benchmark/pull/1803 cc @H-Huang @xuzhao9 @cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/106009 Approved by: https://github.com/malfet	2023-08-03 16:28:40 +00:00
Bin Bao	28d42e66e4	[CI] Add DALLE2_pytorch to FORCE_AMP_FOR_FP16_BF16_MODELS (#104283 ) Summary: DALLE2_pytorch inference does not support bfloat16, fallback to use AMP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104283 Approved by: https://github.com/eellison	2023-06-28 02:37:15 +00:00
Bin Bao	a2988c9e6a	[CI] Switch inference accuracy and performance tests to bfloat16 (#103535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103535 Approved by: https://github.com/eellison	2023-06-17 00:24:37 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Animesh Jain	d6da649a1b	[benchmark] hf_T5_base - torchbench original batchsize too large (#103442 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103442 Approved by: https://github.com/desertfire	2023-06-15 01:06:40 +00:00
Animesh Jain	16c2090b2d	[benchmark][compile] Limit number of bounding boxes to 5 (#103413 ) Depends on https://github.com/pytorch/benchmark/pull/1729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103413 Approved by: https://github.com/ezyang	2023-06-15 01:06:40 +00:00
Animesh Jain	428bff842d	[benchmarks] Torchbench llama is not suitable for training (#103094 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103094 Approved by: https://github.com/eellison, https://github.com/desertfire	2023-06-07 01:33:07 +00:00
Animesh Jain	33a49eeae7	[benchmark] Flag to switch on activation checkpointing for HF models (#102557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557 Approved by: https://github.com/ngimel, https://github.com/Chillee	2023-05-30 23:46:14 +00:00
Edward Z. Yang	22ca1a1124	Partially fix shape mismatch in vision_maskrcnn (#101477 ) The bulk of the heavy lifting is happening in https://github.com/pytorch/vision/pull/7592 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101477 Approved by: https://github.com/voznesenskym	2023-05-21 05:20:08 +00:00
Edward Z. Yang	41468833fb	vision_maskrcnn is now deterministic (#101116 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101116 Approved by: https://github.com/ngimel	2023-05-16 21:32:17 +00:00
Edward Z. Yang	f48718f749	Update torchbench pin (#101365 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101365 Approved by: https://github.com/albanD, https://github.com/awgu	2023-05-15 16:52:31 +00:00
Edward Z. Yang	fcf2fb273c	Make missing model import error marginally better (#101221 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101221 Approved by: https://github.com/albanD, https://github.com/anijain2305	2023-05-14 19:57:01 +00:00
Edward Z. Yang	41a4e22015	Update torchbench pin (#101071 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101071 Approved by: https://github.com/malfet	2023-05-11 18:09:40 +00:00
Edward Z. Yang	ad070b6dfa	Check canary_models for models too in torchbench.py (#101081 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101081 Approved by: https://github.com/desertfire	2023-05-11 13:23:17 +00:00
Edward Z. Yang	d25c93f919	Remove speech_transformer workaround, torchbench handles it correctly now (#100558 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100558 Approved by: https://github.com/albanD	2023-05-04 01:14:24 +00:00
Yanbo Liang	896eb1db26	[Dynamo] Skip TB Background_Matting model eager accuracy check because of non deterministic (#100513 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100513 Approved by: https://github.com/anijain2305	2023-05-03 07:06:50 +00:00
Yanbo Liang	3009c42e7d	[CI Testing] Re-enable timm_efficientdet training (#99787 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/99787 Approved by: https://github.com/desertfire	2023-04-24 20:05:15 +00:00
Edward Z. Yang	fc8fa6c356	Require at least one tensor to be marked dynamic with --dynamic-batch-only (#99620 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99620 Approved by: https://github.com/voznesenskym	2023-04-21 00:17:08 +00:00
Will Constable	9ac2b041c9	Make opacus xfail instead of skip (#99380 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380 Approved by: https://github.com/desertfire, https://github.com/anijain2305	2023-04-19 21:09:06 +00:00
Huy Do	5d395769a6	Skip vision_maskrcnn after #98923 (#99394 ) This is failing in trunk as documented in https://github.com/pytorch/pytorch/issues/99438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99394 Approved by: https://github.com/desertfire	2023-04-19 17:07:07 +00:00
Bin Bao	46b9377190	[CI] Collect inductor max-autotune performance every Sunday (#99387 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99387 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-04-18 13:20:13 +00:00
Will Constable	6eab5e88c8	Graph-break on allowed modules if they have hooks (#97184 ) Allowed modules are stuck into dynamo's fx graph as call_module nodes, without dynamo doing any tracing of the module. This means during AOT trace time, hooks will fire during tracing when the call_module is executed, but the hooks themselves will disappear after that and not be present in the compiled program. (worse, if they performed any tensor operations, those would get traced so you could end up with part of the hook's functionality). To circumvent this, there are two options for 'allowed modules' with hooks. 1) don't treat them as 'allowed' - trace into them 2) graph-break, so the module is no longer part of the dynamo trace at all (1) will fail for users that opted into allowed modules becuase they know their module has problems being traced by dynamo. (2) causes graph breaks on common modules such as nn.Linear, just because they are marked as 'allowed'. It would help matters if we could differentiate between types of allowed modules (A) allowed to avoid overheads - used for common ops like nn.Linear (B) allowed to avoid dynamo graphbreaks caused by unsupported code Ideally, we'd use method (1) for group (A) and (2) for (B). For now, graph-break on all cases of allowed modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97184 Approved by: https://github.com/jansel	2023-04-15 01:46:15 +00:00
Bin Bao	5210d7c423	[CI] Mark vision_maskrcnn as NONDETERMINISTIC (#98570 ) Summary: vision_maskrcnn fails eager checking, so mark it as NONDETERMINISTIC to reduce noise on the dashboard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98570 Approved by: https://github.com/eellison, https://github.com/huydhn	2023-04-07 19:33:20 +00:00
Bin Bao	c4de7fdef5	[CI] Mark sebotnet33ts_256 as nondeterministic (#98356 ) Summary: The goal is make sure the new dashboard doesn't give noisy alert on this test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98356 Approved by: https://github.com/ezyang	2023-04-05 12:05:47 +00:00
Bin Bao	bd6db54285	[CI] Mark mobilenet_v3_large as nondeterministic (#98314 ) Summary: Skip mobilenet_v3_large for accuracy checking to reduce noise on the dashboard. The root cause still needs to be investigated. mobilenet_v3_large shows random accuracy check failures with different error values from time to time, and here are some examples: ``` cuda train mobilenet_v3_large [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.02172, (ref-fp64): 0.01068 and shape=torch.Size([960, 1, 5, 5]) [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.14.block.1.0.weight.grad ``` ``` cuda train mobilenet_v3_large [2023-04-04 14:57:59,972] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.07744, (ref-fp64): 0.03073 and shape=torch.Size([72, 1, 5, 5]) [2023-04-04 14:57:59,973] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.4.block.1.0.weight.grad ``` One observation is turnning off cudnn in the eager mode with `torch.backends.cudnn.enabled = False` makes the non-deterministic behvior go away but meanwhile it fails accuaracy checking consistently. Minifier didn't help to narrow down the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98314 Approved by: https://github.com/huydhn	2023-04-04 21:55:23 +00:00
Bin Bao	69ff39d2e7	Skip gat, gcn and sage for TorchBench CUDA test (#98244 ) Summary: The three models only support CPU for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244 Approved by: https://github.com/ezyang	2023-04-04 01:06:18 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Bin Bao	02792ff16f	[CI] Make inductor-perf-test-nightly produce data for dashboard (#95685 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95685 Approved by: https://github.com/ezyang, https://github.com/huydhn	2023-03-06 03:14:03 +00:00
Natalia Gimelshein	f2aee8b8d5	small fixes for mlir backend (#94717 ) Fixes for skipped tests with mlir triton backend (will unskip once #94249 lands) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94717 Approved by: https://github.com/malfet, https://github.com/atalman	2023-02-13 22:42:53 +00:00
Nikita Shulga	4869929f32	Update Triton hash (#94249 ) That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12) Tweak CI to install gcc-9 to build trition Disable a few tests to make everything be correct Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249 Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta	2023-02-13 13:17:36 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Michael Voznesensky	333e771394	Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94146 Approved by: https://github.com/ezyang	2023-02-08 01:18:38 +00:00
atalman	6e285c479d	Remove cuda 11.6 from CI replace with 11.7 (#93406 ) Remove cuda 11.6 from CI replace with 11.7 Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406 Approved by: https://github.com/malfet, https://github.com/desertfire	2023-02-02 19:16:05 +00:00
Edward Z. Yang	c52567ec18	Switch CI exclusions to use exact match. (#92761 ) Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI." Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761 Approved by: https://github.com/jansel	2023-01-22 17:10:20 +00:00
Jason Ansel	7c1c239db1	[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575 ) This reverts commit `94262efc7d` to reland #91105 / #90738. Fixes https://github.com/pytorch/torchdynamo/issues/2015 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575 Approved by: https://github.com/ngimel	2023-01-11 00:08:03 +00:00
blzheng	0c1777acec	Dynamo benchmark: add CPU specific changes (#88477 ) This pr adds some CPU specific changes: - Add support for IPEX backend - https://github.com/pytorch/torchdynamo/issues/1618 - https://github.com/pytorch/torchdynamo/issues/1534 - Enable CPU launcher in runner.py. - Fix the issue that some environment variables are not support on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-07 09:26:06 +00:00
Shunting Zhang	a5f32f8978	training support for dynamo+torchxla integration (#88449 ) We've already shown some promising perf result by integrating dynamo with torchxla for inference. To provide consistent UX for training and for inference, in this PR we try to enable training for dynamo/torchxla. Training is trickier than inference and we may not expect much perf gains since 1. in training case, torchxla only generate a single combined graph for fwd/bwd/optimizer while in `torchxla_trace_once` bridge we added in dynamo, due to how AOT_Autograd works, we will generate 3 graphs: one for forward, one for backward and one for the optimizer. XLA favors larger graph to do more optimizations. 2. in training case, tracing overhead can be overlapped with computation. Tracing overhead is not as a big deal for training as for inference. After all training cares more about throughput while inference cares more about latency. 3. in training case, people can increase batch size to 'mitigate' the tracing overhead. Increase batch size does not change tracing overhead, thus it shows like the tracing overhead 'per example' reduces. But we still want to add training support to dynamo/torchxla to make the work complete. We added '--iterations-per-run' argument to control how may iterations we do per measure/device sync. This is to understand the impact of item 2 above. Results: With '--iterations-per-run' equals to 1, here are the perf numbers: ``` +-------------------------+--------------------+-------------------------+ \| Model \| XLA (trace once) \| XLA (trace everytime) \| +=========================+====================+=========================+ \| resnet18 \| 0.91 \| 0.959 \| +-------------------------+--------------------+-------------------------+ \| resnet50 \| 0.917 \| 0.932 \| +-------------------------+--------------------+-------------------------+ \| resnext50_32x4d \| 0.912 \| 0.905 \| +-------------------------+--------------------+-------------------------+ \| alexnet \| 1.038 \| 0.974 \| +-------------------------+--------------------+-------------------------+ \| mobilenet_v2 \| 0.881 \| 0.835 \| +-------------------------+--------------------+-------------------------+ \| mnasnet1_0 \| 0.903 \| 0.931 \| +-------------------------+--------------------+-------------------------+ \| vgg16 \| 0.914 \| 0.967 \| +-------------------------+--------------------+-------------------------+ \| BERT_pytorch \| 1.359 \| 0.84 \| +-------------------------+--------------------+-------------------------+ \| timm_vision_transformer \| 1.288 \| 0.893 \| +-------------------------+--------------------+-------------------------+ \| geomean \| 1.0006 \| 0.913794 \| +-------------------------+--------------------+-------------------------+ ``` Overall it looks like graph break indeed cause perf loss. But for BERT_pytorch and timm_vision_transformer we still see perf gain. We need do more experiments with larger '--iterations-per-run' NOTE: In torchbench.py I added the following code to do a few workaround: ``` from myscripts import workaround # TODO will remove this line before landing ``` Here are the content of workaround.py: ``` import torch from torch import nn import os # override max_pool2d with avg_pool2d if os.environ.get("REPLACE_MAXPOOL", "0") == "1": torch.nn.MaxPool2d = torch.nn.AvgPool2d ``` It work around a few issues we found 1. MaxPool2d does not work for training in dynamo/torchxla: https://github.com/pytorch/torchdynamo/issues/1837 . WIP fix from Brian in https://github.com/pytorch/pytorch/pull/90226 , https://github.com/pytorch/xla/pull/4276/files (WIP) 2. recent change ( this PR https://github.com/pytorch/pytorch/pull/88697 ) in op decomposition cause batch_norm ops to fallback in torchxla. Fix from jack in https://github.com/pytorch/xla/pull/4282#event-7969608134 . (confirmed the fix after adding Deduper to handle duplicated return from fx graph generated by AOTAutograd) 3. we have issue to handle dropout because of random seed out of sync issue. Here is the fix: https://github.com/pytorch/xla/pull/4293 (confirmed the fix) Example command: ``` REPLACE_MAXPOOL=1 USE_FAKE_TENSOR=0 GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=aot_torchxla_trace_once --only vgg16 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88449 Approved by: https://github.com/wconstab, https://github.com/qihqi, https://github.com/malfet	2023-01-05 19:59:34 +00:00
PyTorch MergeBot	94262efc7d	Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105 )" This reverts commit `d6dd2e97da`. Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds	2022-12-21 00:02:38 +00:00
Jason Ansel	d6dd2e97da	[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105 ) https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land. This is an exact copy that PR without using ghstack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105 Approved by: https://github.com/ngimel	2022-12-20 02:38:23 +00:00
Edward Z. Yang	212873c615	Add dynamic shapes benchmark accuracy to CI (#90444 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444 Approved by: https://github.com/voznesenskym	2022-12-17 11:17:20 +00:00
PyTorch MergeBot	e2377c8300	Revert "Add dynamic shapes benchmark accuracy to CI (#90444 )" This reverts commit `85db031e60`. Reverted https://github.com/pytorch/pytorch/pull/90444 on behalf of https://github.com/ezyang due to lint failing	2022-12-17 07:18:07 +00:00
Edward Z. Yang	85db031e60	Add dynamic shapes benchmark accuracy to CI (#90444 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444 Approved by: https://github.com/voznesenskym	2022-12-17 06:39:45 +00:00
Michael Lazos	7c524221ba	[reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 ) …king (#87492)" (#90746)" This reverts commit `ff1bbc2773`. This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956 Approved by: https://github.com/desertfire	2022-12-17 06:27:15 +00:00
PyTorch MergeBot	6bc6fb21db	Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 )" This reverts commit `8bc38ae4e2`. Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures	2022-12-16 19:28:05 +00:00
Michael Lazos	8bc38ae4e2	[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956 ) …king (#87492)" (#90746)" This reverts commit `ff1bbc2773`. This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956 Approved by: https://github.com/desertfire	2022-12-16 13:33:38 +00:00
Bin Bao	ff1bbc2773	Revert "[reland][dynamo] use optimizers correctly in benchmarking (#87492 )" (#90746 ) This reverts commit `d91d7a3221`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746 Approved by: https://github.com/anijain2305	2022-12-13 11:37:16 +00:00
Animesh Jain	d91d7a3221	[reland][dynamo] use optimizers correctly in benchmarking (#87492 ) Reland https://github.com/pytorch/pytorch/pull/87311 mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492 Approved by: https://github.com/desertfire	2022-12-09 20:32:53 +00:00
Animesh Jain	3162a48a77	[dynamo][benchmarks] Call zero grad (#90026 ) Hoping that it might reduce some flakiness Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026 Approved by: https://github.com/williamwen42	2022-12-02 04:05:57 +00:00
Animesh Jain	68805b08d1	[benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780 ) Moving to train mode for TIMM models and also raising batch size for accuracy testing. Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780 Approved by: https://github.com/ngimel	2022-11-30 12:57:35 +00:00
Xu Zhao	e4d9dbd7d2	Port torchdynamo's torchbench script to userbenchmark (#89239 ) Summary: This Diff ports the torchbench.py script from torchdynamo to torchbench to support the development of internal models. Currently, only works with the `--only` option, and can only test one model at a time. Note that the noisy logs are from upstream model code, not the benchmark code. In the internal environment, `torch._dynamo.config.base_dir` is not writable, so we add an option to specify the output directory. Test Plan: ``` $ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only ads_dhen_5x --part over --output-directory /tmp/tb-test/ cuda eval ads_dhen_5x 1/ 1 +0 frames 2s 1 graphs 1 graph calls 412/ 411 = 100% ops 100% time ``` ``` $ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only cmf_10x --part over --output-directory /tmp/tb-test/ cuda eval cmf_10x 1/ 1 +0 frames 1s 1 graphs 1 graph calls 306/ 305 = 100% ops 100% time ``` Reviewed By: jansel Differential Revision: D41294311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89239 Approved by: https://github.com/jansel	2022-11-21 17:25:28 +00:00
Animesh Jain	cad5772c2c	[dashboard][huggingface] skip accuracy checks for really large models… (#89273 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89273 Approved by: https://github.com/desertfire	2022-11-19 00:22:45 +00:00
Edward Z. Yang	d596b048e5	Also skip large models for normal --accuracy runs (#88086 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88086 Approved by: https://github.com/albanD	2022-11-01 00:59:09 +00:00
Will Constable	ee231671c0	Make torchbench setup a function (#87469 ) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87469 Approved by: https://github.com/anijain2305	2022-10-21 19:58:38 +00:00
PyTorch MergeBot	f38a88c4dd	Revert "[dynamo] use optimizers correctly in benchmarking (#87311 )" This reverts commit `703c19008d`. Reverted https://github.com/pytorch/pytorch/pull/87311 on behalf of https://github.com/anijain2305 due to Bin (desertfire) is trying to get torchbench models in CI, and this PR prevents that. I will bring this back after models are in CI.	2022-10-20 22:01:51 +00:00
Animesh Jain	703c19008d	[dynamo] use optimizers correctly in benchmarking (#87311 ) We were not setting optimizers correctly * This hid the issue that we see here - https://github.com/pytorch/torchdynamo/issues/1687 * This has also revealed that we are activating profilers for every dynamo optimized model call. This could affect speedup cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87311 Approved by: https://github.com/mlazos, https://github.com/yanboliang	2022-10-20 05:46:25 +00:00
Animesh Jain	c30cfb07ab	[dynamo][dashboard] Run 2 iterations for the correctness runs (#87104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104 Approved by: https://github.com/soumith	2022-10-18 15:53:40 +00:00
Jason Ansel	054a2fd6c2	Sync changes from `pytorch/torchdynamo` (#87013 ) This updates to: `6380959be2` Generated with: https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013 Approved by: https://github.com/voznesenskym	2022-10-15 21:00:57 +00:00
Jason Ansel	8f71e8de7e	Sync changes from pytorch/torchdynamo, enable tests (#86950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950 Approved by: https://github.com/Chillee	2022-10-14 23:08:58 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00

1 2 3

133 Commits