pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shunting Zhang	aaba3a87b1	tune down batch-size for res2net to avoid OOM (#122977 ) The batch-size for this model is 64 previously. Later on we change that to 256 and cause OOM in cudagraphs setting. This PR tune the batch size down to 128. Share more logs from my local run ``` cuda,res2net101_26w_4s,128,1.603578,110.273572,335.263494,1.042566,11.469964,11.001666,807,2,7,6,0,0 cuda,res2net101_26w_4s,256,1.714980,207.986155,344.013071,1.058278,22.260176,21.034332,807,2,7,6,0,0 ``` The log shows that torch.compile uses 11GB for 128 batch size and 21GB for 256 batch size. I guess the benchmark script has extra overhead cause the model OOM for 256 batch size in the dashboard run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122977 Approved by: https://github.com/Chillee	2024-03-30 03:54:53 +00:00
Angela Yi	482d8bf1ea	[aoti] Change aot_compile callsites (#122225 ) Summary: Replacing `torch._export.aot_compile` callsites with ``` ep = torch.export._trace._export(.., predispatch=True) # Traces the given program into predispatch IR so_path = torch._inductor.aot_compile_ep(ep, ...) # Takes an exported program and compiles it into a .so ``` This allows us to explicitly split up the export step from AOTInductor. We can later modify tests to do `export + serialize + deserialize + inductor` to mimic internal production use cases better. Test Plan: CI Differential Revision: D54808612 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122225 Approved by: https://github.com/SherlockNoMad, https://github.com/khabinov	2024-03-29 21:34:20 +00:00
chilli	ed37fbdf60	made gpt_fast benchmark run faster (#122872 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122872 Approved by: https://github.com/msaroufim, https://github.com/yifuwang ghstack dependencies: #122848	2024-03-29 03:49:19 +00:00
chilli	52b1d2a73d	Increase timm batch sizes to make less overhead-bound and less noisy (#122581 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122581 Approved by: https://github.com/ezyang ghstack dependencies: #122686, #122688, #121692, #122841	2024-03-28 02:34:32 +00:00
Oguz Ulgen	a697d972b1	Fix torchbench errors (#122735 ) Summary: It looks like this target has stopped working, lets fix it. Test Plan: ``` buck2 run mode/opt //caffe2/benchmarks/dynamo/:test ``` now works Differential Revision: D55389546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122735 Approved by: https://github.com/xmfan	2024-03-27 06:59:16 +00:00
Jason Ansel	069270db60	[dynamo] Fix list comparison ops (#122559 ) Fixes #122376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122559 Approved by: https://github.com/Skylion007	2024-03-25 07:03:23 +00:00
Shunting Zhang	152fa9ecc2	skip moondream for training (#122483 ) The model shows as failed model on the dashboard for training. But the model is not implemented for training (at least for now): `2196021e9b/torchbenchmark/models/moondream/__init__.py (L6)` Skip it in dashboard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122483 Approved by: https://github.com/eellison	2024-03-22 17:35:52 +00:00
drisspg	7fa1be506b	Add an option to sdpa benchmark to specify backend (#122368 ) # Summary Adds the ability to specify sdpa backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/122368 Approved by: https://github.com/cpuhrsch	2024-03-21 07:00:40 +00:00
Huy Do	a1d02b423c	XFAIL detectron2_maskrcnn_r_101_c4 CPU inductor accuracy (#122263 ) This starts to fail in trunk after the stack https://github.com/pytorch/pytorch/pull/122066 lands Pull Request resolved: https://github.com/pytorch/pytorch/pull/122263 Approved by: https://github.com/jansel	2024-03-20 08:03:34 +00:00
Shunting Zhang	1c4887d52b	fix dlrm accuracy test in max-autotune (#122012 ) torchrec_dlrm training fail the accuracy check when max-autotune is enabled. I found there is no real issue in PT2. We fail to get fp64 reference results for the accuracy check. In max-autotune mode numerical may change a bit and cause the cosine similarity check fail. Using fp64 baseline is more reliable and make the test pass. The reason why we are not using a fp64 baseline earlier is because torchrec uses a dataclass [Batch](`99e6e669b5/torchrec/datasets/utils.py (L28)`) to represent the input. We use pytree to cast model and inputs to fp64. pytree can not look into a dataclass. My fix is to convert the dataclass to namedtuple to be more pytree friendly Pull Request resolved: https://github.com/pytorch/pytorch/pull/122012 Approved by: https://github.com/jansel, https://github.com/eellison	2024-03-19 22:23:42 +00:00
Jason Ansel	07caea5c12	[dynamo] Refactor COMPARE_OP and comparison builtins (#122043 ) This removes the duplicate handling of comparison ops between symbolic_convert and bultin and refactors the handling to use the binop infrastructure. This change regresses overheads a bit, but this is fixed in the next PR. New test skips are variants of `type(e) is np.ndarray` previously falling back to eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122043 Approved by: https://github.com/anijain2305 ghstack dependencies: #122039	2024-03-19 04:23:17 +00:00
eellison	ba69dc6675	[Easy] add option to print compilation time (#121996 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121996 Approved by: https://github.com/davidberard98	2024-03-18 22:42:41 +00:00
Jason Ansel	5a10b56083	[dynamo] Small microbenchmark changes (#122032 ) Used to generate numbers in #122029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122032 Approved by: https://github.com/yanboliang	2024-03-18 18:08:06 +00:00
Jason Ansel	1e9a7df8fe	[dynamo] Compile time optimizations in tx.step() (#121790 ) `python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` - Before: `symbolic_convert_overhead_stress_test: 10.7s` - After: `symbolic_convert_overhead_stress_test: 8.6s` `tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790 Approved by: https://github.com/oulgen	2024-03-15 01:01:05 +00:00
Yanbo Liang	43e243180b	Add gpt-fast as a static benchmark (#121886 ) Run: ``` python benchmarks/gpt_fast/benchmark.py ``` It generated a cvs file ```gpt_fast_benchmark.csv``` with the content like: ``` name,mode,target,actual,percentage Llama-2-7b-chat-hf,bfloat16,104,103.458618,99.48% Llama-2-7b-chat-hf,int8,155,158.964615,102.56% Mixtral-8x7B-v0.1,int8,97,99.760132,102.85% ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121886 Approved by: https://github.com/Chillee	2024-03-14 21:46:59 +00:00
Animesh Jain	cd1751b14f	[dynamo] Measure Dynamo cache latency lookup (#121604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121604 Approved by: https://github.com/jansel ghstack dependencies: #121614, #121622	2024-03-12 17:09:11 +00:00
Jason Ansel	9aa3fedb75	Slightly faster FX graph iterator (#121611 ) Before: ``` iterating over 100000000 FX nodes took 5.9s (16830686 nodes/s) ``` After: ``` iterating over 100000000 FX nodes took 5.0s (19937698 nodes/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121611 Approved by: https://github.com/oulgen	2024-03-11 20:00:19 +00:00
James Wu	ae22bdaefe	Update torchbench commit pin, add sam_fast benchmark (#121420 ) After this, the sam_fast benchmark can now be run in the pytorch repo: ``` SEGMENT_ANYTHING_FAST_USE_FLASH_4=0 benchmarks/dynamo/torchbench.py --inference --amp --performance --backend=inductor --explain --only sam_fast ``` sam_fast is designed for inference only, with cuda and amp on. The code adds these restrictions to the benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121420 Approved by: https://github.com/oulgen, https://github.com/msaroufim	2024-03-11 19:48:53 +00:00
Yifu Wang	d7a5e59647	[dynamo] support group=None when rewriting collectives (#121043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121043 Approved by: https://github.com/awgu	2024-03-06 21:37:19 +00:00
angelayi	58ac4a2007	Remove llava from ci_expected_accuracy as it's flaky (#121322 ) https://github.com/pytorch/pytorch/pull/121029 added it into the CI but the test is flaky on hud. It alternates between fail_accuracy and fail_to_run Pull Request resolved: https://github.com/pytorch/pytorch/pull/121322 Approved by: https://github.com/desertfire	2024-03-06 20:47:01 +00:00
angelayi	ae4c85960f	Add Deberta pass (#121206 ) Adding DebertaForQuestionAnswering to inductor benchmark pass, as it did not show up before Pull Request resolved: https://github.com/pytorch/pytorch/pull/121206 Approved by: https://github.com/desertfire	2024-03-05 17:56:25 +00:00
Sun, Jiayi	ee557d8f61	skip detectron2_fcos_r_50_fpn in dynamic shape test (#120697 ) As reported in https://github.com/pytorch/pytorch/issues/119434, `detectron2_fcos_r_50_fpn` failed with dynamic shape testing, we propose to skip the dynamic batch size testing of this model in this PR. * Error msg is ``` File "/home/jiayisun/pytorch/benchmarks/dynamo/common.py", line 3877, in run assert marked, f"nothing in example_inputs had a dim with {batch_size}" AssertionError: nothing in example_inputs had a dim with 4 ``` * Root Cause is Benchmark code will only annotate the inputs' dim as dynamic when its size equals to batch size `c617e7b407/benchmarks/dynamo/common.py (L3867-L3871)`. If it fails to find any dim equals to batch size, above error throws. However, the inputs of `detectron2_fcos_r_50_fpn` are as follows: ``` ([{'file_name': '/home/jiayisun/benchmark/torchbenchmark/data/.data/coco2017-minimal/coco/val2017/000000001268.jpg', 'height': 427, 'width': 640, 'image_id': 1268, 'image': tensor([[[147., 124., 82., ..., 3., 4., 5.], [125., 104., 65., ..., 3., 3., 4.], [ 87., 68., 34., ..., 2., 2., 2.], ..., [ 47., 45., 41., ..., 45., 45., 45.], [ 46., 44., 40., ..., 44., 45., 46.], [ 46., 44., 40., ..., 43., 45., 46.]], [[154., 129., 84., ..., 3., 4., 5.], [133., 110., 69., ..., 3., 3., 4.], [ 95., 76., 43., ..., 2., 2., 2.], ..., [ 44., 42., 38., ..., 34., 37., 39.], [ 43., 41., 37., ..., 35., 39., 41.], [ 43., 41., 37., ..., 35., 40., 43.]], [[171., 140., 85., ..., 3., 4., 5.], [147., 120., 71., ..., 3., 3., 4.], [103., 83., 47., ..., 2., 2., 2.], ..., [ 46., 44., 40., ..., 16., 20., 22.], [ 45., 43., 39., ..., 17., 22., 26.], [ 45., 43., 39., ..., 18., 24., 28.]]])}, ... ],) ``` None of the inputs' dim will equal to input batch size, so I think we may need to skip the dynamic batch size testing for this model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120697 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/desertfire	2024-03-05 12:12:18 +00:00
angelayi	c3c618c750	Update torchbench pin (#121029 ) Fixes https://github.com/pytorch/pytorch/issues/117280 after bumping the HF version in https://github.com/pytorch/benchmark/pull/2179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121029 Approved by: https://github.com/desertfire	2024-03-05 03:21:32 +00:00
PyTorch MergeBot	368f242e37	Revert "[PT2D] Make the speedup benchmark works with DDP + CompiledAutograd (#120454 )" This reverts commit `8c2e569928`. Reverted https://github.com/pytorch/pytorch/pull/120454 on behalf of https://github.com/desertfire due to breaks nightly dashboard cudagraphs run ([comment](https://github.com/pytorch/pytorch/pull/120454#issuecomment-1975001824))	2024-03-03 02:58:47 +00:00
Shunting Zhang	c4ed456fc3	[inductor] fix accuracy failure for a few models under freezing (#121054 ) Fix https://github.com/pytorch/pytorch/issues/120545 . The reason why these models fail accuracy test with freezing is due to the conv-batchnorm fusion. Conv-batchnorm fusion causes relative big numerical churn. For the failed TIMM models, raising the tolerance to `8 * 1e-2` can make the test pass. For the failed TB models, the numerical difference is too large. Having a discussion with @eellison , we decided to skip them with freezing for now. One the other hand, we probably should dig more why the conv-bn fusion cause such large numerical difference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121054 Approved by: https://github.com/eellison	2024-03-02 04:53:59 +00:00
Chien-Chin Huang	8c2e569928	[PT2D] Make the speedup benchmark works with DDP + CompiledAutograd (#120454 ) With DDP + CompiledAutograd, we could not use the same parallelized model to do the test. This PR copies the model. Differential Revision: [D54094257](https://our.internmc.facebook.com/intern/diff/D54094257/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120454 Approved by: https://github.com/yf225, https://github.com/xmfan	2024-03-01 08:35:22 +00:00
leslie-fang-intel	950b484356	skip three pyhpc models with dynamic shape test (#120599 ) As reported in https://github.com/pytorch/pytorch/issues/119434, `pyhpc_isoneutral_mixing`, `pyhpc_equation_of_state` and `pyhpc_turbulent_kinetic_energy` failed with dynamic shape testing, we propose to skip the dynamic batch size testing of these 3 models in this PR. * Error msg is ``` File "/localdisk/leslie/torch_inductor_community/pytorch/benchmarks/dynamo/common.py", line 3879, in run assert marked, f"nothing in example_inputs had a dim with {batch_size}" AssertionError: nothing in example_inputs had a dim with 1048576 ``` * Root Cause is * Benchmark code will only annotate the inputs' dim as dynamic when its size equals to batch size `c617e7b407/benchmarks/dynamo/common.py (L3867-L3871)`. If it fails to find any dim equals to batch size, above error throws. * However, for these 3 models, none of the inputs' dim will equal to input batch size since the [relationship of dim sizes](`26b85eadde/torchbenchmark/models/pyhpc_equation_of_state/__init__.py (L12-L16)`) ``` shape = ( math.ceil(2 * size ** (1/3)), math.ceil(2 * size ** (1/3)), math.ceil(0.25 * size ** (1/3)), ) ``` * Another thing is `pyhpc_isoneutral_mixing`, `pyhpc_equation_of_state` can pass the dynamic batch size accuracy testing, because the batch size has been set to 4 in accuracy testing (`c617e7b407/benchmarks/dynamo/common.py (L3456)`) and `math.ceil(2 * size ** (1/3))` happens equaling to 4. * Since the dim sizes of input has above relationship, running the these models in dynamic shape, we may need to annotate `dim[0](s0) = dim[2](s1) * 8`, per the discussion in https://github.com/pytorch/pytorch/issues/117477#issuecomment-1897108756 @avikchaudhuri, looks like we are not expressible for this case. So, I think we may need to skip the dynamic batch size testing for these 3 models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120599 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-02-29 00:38:06 +00:00
Scott Wolchok	98d1529474	[PyTorch] fix mixed int32/int64 indices/offsets for embedding_bag_out (#120752 ) This was an oversight in D27482738 (#55189) -- it only patched the regular embedding_bag operator, but static runtime uses the out variant. Differential Revision: [D54285460](https://our.internmc.facebook.com/intern/diff/D54285460/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120752 Approved by: https://github.com/houseroad	2024-02-28 20:13:30 +00:00
Sergii Dymchenko	d341b66e96	Revert [dynamo] support group=None when rewriting collectives (#12018 ) (#120677 ) This reverts commit `298c686d3f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120677 Approved by: https://github.com/yifuwang, https://github.com/huydhn	2024-02-27 00:33:35 +00:00
Shunting Zhang	b381a4372b	make GPT2ForSequenceClassification pass inference accuracy check (#120537 ) We need a higher tolerance for GPT2ForSequenceClassification since if I change --bfloat16 in ``` time python benchmarks/dynamo/huggingface.py --accuracy --inference --bfloat16 --backend inductor --disable-cudagraphs --only GPT2ForSequenceClassification ``` to --float16 or --float32 it will pass the accuracy check. Adding --freezing can also make the test pass for this model. I think that's may be due to different fusion output being generated (depending on if constant propagation is happening controlled by freezing) and cause some small numerical difference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120537 Approved by: https://github.com/jansel	2024-02-26 11:02:57 +00:00
leslie-fang-intel	c617e7b407	Add resnet50/mobilenet_v2_quantized_qat in into deterministic_algorithms exclusive list (#120384 ) After PR: https://github.com/pytorch/pytorch/pull/120026, 2 `Torchbench` testcases: `resnet50_quantized_qat` and `mobilenet_v2_quantized_qat` can pass the performance testing but failed with accuracy test. The failure msg is: `mobilenet_v2_quantized_qat, RuntimeError: quantized_resize_cpu_ does not have a deterministic implementation but you set 'torch.use_deterministic_algorithms(True)'. ` - `torch.use_deterministic_algorithms(True)` only setting for accuracy test. `fff9d98e58/benchmarks/dynamo/common.py (L3480)` - However, `quantized_resize_cpu_` only support `nondeterministic_algorithms` because the resized output memory may be uninitialized. `fff9d98e58/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp (L85-L87)` Add these 2 models into the deterministic_algorithms exclusive model list in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120384 Approved by: https://github.com/desertfire, https://github.com/jgong5	2024-02-26 05:05:43 +00:00
Yifu Wang	298c686d3f	[dynamo] support group=None when rewriting collectives (#120118 ) Resolves case 2 in #120082. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120118 Approved by: https://github.com/wconstab ghstack dependencies: #120370	2024-02-25 03:12:10 +00:00
Yukio Siraichi	cef9f70f4b	Move torchbench model configuration into a YAML file. (#120299 ) This PR moves other aspects of torchbench's model configuration (e.g. batch size, tolerance requirements, etc.) into a new YAML file: `torchbench.yaml`. It also merges the recently added `torchbench_skip_models.yaml` file inside the `skip` key. This is an effort so that external consumers are able to easily replicate the performance results and coverage results from the PyTorch HUD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120299 Approved by: https://github.com/jansel	2024-02-23 14:00:14 +00:00
baocheny	edd03f975f	highlight readme code block (#120228 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120228 Approved by: https://github.com/mikaylagawarecki	2024-02-22 21:23:08 +00:00
xiangdong	e06978be4b	[CI] Add initial inductor cpu smoketest for performance (#116456 ) Co-authored-by: chuanqiw <chuanqi.wang@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116456 Approved by: https://github.com/jgong5, https://github.com/atalman	2024-02-21 20:04:50 +00:00
Yukio Siraichi	92bf2a4550	[torchbench] Update skipped models. (#120117 ) This PR updates the list of benchmarks that should (not) be skipped. Here's a summary of the changes: - `detectron2_maskrcnn`: #120115 - `fambench_xlmr`: moved to canary models - `hf_Bert` and `hf_Bert_large`: pass - `maml`: pass - `clip`: renamed to `hf_clip` - `gat`, `gcn`, and `sage`: moved to canary models Pull Request resolved: https://github.com/pytorch/pytorch/pull/120117 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-02-19 18:08:32 +00:00
Eddie Yan	cd380c794f	[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663 ) #113713 Going to clean up some of the checks and will remove draft status after. Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`. CC @drisspg @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663 Approved by: https://github.com/drisspg	2024-02-14 22:02:06 +00:00
Chien-Chin Huang	c0e5cca4f8	[DDP] Change the --no-optimize-ddp flag to reflect the latest usage (#119437 ) Compiled DDP now has 4 different optimization modes. This PR changes the Dynamo benchmark flag to reflect that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119437 Approved by: https://github.com/wconstab, https://github.com/xmfan	2024-02-13 16:53:56 +00:00
chuanqiw	074f2bb5ce	Fix dynamo benchmark runner for torchbench skip sets (#118615 ) Fix dynamo benchmark runner for torchbench skip sets, which introduced by PR #118032 This runner.py script is still used in the [Inductor CPU Performance Dashboard](https://github.com/pytorch/pytorch/issues/93531) regular test Pull Request resolved: https://github.com/pytorch/pytorch/pull/118615 Approved by: https://github.com/jgong5, https://github.com/ysiraichi, https://github.com/ezyang	2024-02-06 02:06:54 +00:00
PyTorch MergeBot	966db82c9d	Revert "Remove extra graph breaks (#118987 )" This reverts commit `9a8e3b07d7`. Reverted https://github.com/pytorch/pytorch/pull/118987 on behalf of https://github.com/eellison due to reverting because it causes regression ([comment](https://github.com/pytorch/pytorch/pull/118987#issuecomment-1928224447))	2024-02-05 22:19:37 +00:00
Michael Lazos	9a8e3b07d7	Remove extra graph breaks (#118987 ) Fixes https://github.com/pytorch/pytorch/issues/104053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118987 Approved by: https://github.com/janeyx99	2024-02-03 05:55:09 +00:00
BowenBao	30f43e3d89	[ONNX][bench] Deepcopy model to another device before export to avoid OOM (#118710 ) Prior to onnx export, the model is deepcopied to avoid modifications that may affect later performance profiling. However this increases the memory requirement on the device. This PR modifies the script to deepcopy and export the model on another device when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118710 Approved by: https://github.com/thiagocrepaldi	2024-01-31 23:03:39 +00:00
Aaron Gokaslan	1562dae62c	[BE]: Apply RUF025 dict.fromkeys preview rule (#118637 ) Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637 Approved by: https://github.com/albanD	2024-01-30 20:46:54 +00:00
Edward Z. Yang	119b66ba16	Use strict to toggle strict options in MYPYSTRICT (#118479 ) As we force a specific version of mypy, it's OK to use the agglomerated flag. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118479 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: #118414, #118418, #118432, #118467, #118468, #118469, #118475	2024-01-28 19:22:22 +00:00
Yukio Siraichi	2f6fc33c20	Move skip sets into a new file. (#118032 ) This PR moves the skip sets that lived in benchmarks/dynamo/torchbench.py into a more readable YAML file, so that it is consumable from other projects (e.g. XLA). Pull Request resolved: https://github.com/pytorch/pytorch/pull/118032 Approved by: https://github.com/lezcano, https://github.com/ezyang	2024-01-24 19:22:01 +00:00
Jason Ansel	c5702a0891	[dynamo] Optimize BACKEND_MATCH guard (#118065 ) As measured by `benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `22.5us` - After `18.1us` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118065 Approved by: https://github.com/ydwu4	2024-01-24 07:47:52 +00:00
Simon Fan	ed0ec2e0be	Remove dynamo runner's dependency on distributed build (#117903 ) So that we can bisect faster without needing to rebuild distributed module. We remove the annotation to avoid flake8 undefined name lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/117903 Approved by: https://github.com/xuzhao9	2024-01-24 06:51:14 +00:00
Jane Xu	13d2cdffa2	Remove optimizer.step patching for profiler hook (#115772 ) 1. I'd like to remove the patching that avoids the profiler hook, but it adds an additional graph break due to nested wrappers. #117767 if interested, see (internal only) paste for [before](P996529232) and [after](P997507449) this PR. ``` I've locally run perf benchmarks for yolov3: Before the speedup is 4.183x, and after it is 4.208x. I've also run it for resnet50: before, speedup is 3.706x and now it is 3.924x. ``` 2. @mlazos I now unwrap twice in the dynamo and inductor tests. This feels like we're testing deficiently--should we add tests to test that tracing through the profiler hook and the use_grad hook are functioning according to expectations (I know there's at least one graph break in one). 3. There's a strange memory thing going on...what is happening? This has been resolved with @voznesenskym's [change](https://github.com/pytorch/pytorch/pull/116169). (for details see below) <details> This PR will fail the test_static_address_finalizer test due to a mysterious thing that is happening (idk what, but maybe the dynamo cache or a frame _expecting_ the patching to have been done). There is no Python refcycle, as the backrefs for `p_ref()` look like: ![image](https://github.com/pytorch/pytorch/assets/31798555/4d6cbf50-3924-4efe-b578-d93389eebec8) (so 5 backrefs but none of them python) And the refs: ![image](https://github.com/pytorch/pytorch/assets/31798555/25e01105-bcb9-44ca-997a-2cf1670a6d42) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/115772 Approved by: https://github.com/jansel, https://github.com/mlazos	2024-01-23 20:15:41 +00:00
Bin Bao	4d625c1c92	[AOTI] Fix a bug in the torch._export.aot_load API (#118039 ) Summary: tree_flatten_spec should use args instead of *args clone of https://github.com/pytorch/pytorch/pull/117948 but with some fbcode specific changes Test Plan: CI Differential Revision: D52982401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118039 Approved by: https://github.com/angelayi	2024-01-23 14:54:02 +00:00
Michael Lazos	f302a0d380	Re-enable SGD (#117434 ) Re-enables the SGD optimizer now that compile times are more reasonable. [Benchmark run](https://github.com/pytorch/pytorch/actions/runs/7511073761) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117434 Approved by: https://github.com/anijain2305, https://github.com/janeyx99	2024-01-19 04:28:50 +00:00

1 2 3 4 5 ...

1507 Commits