pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	3555ebb63d	[BE]: Update ruff to 0.11.8 (#153249 ) Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere	2025-05-12 18:30:52 +00:00
Oguz Ulgen	dc55704b48	Rename cache limit to recompile limit in configs (#143709 ) This PR renames every cache_limit to recompile_limit via sed. Old config options are maintained via Config(alias='xyz') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709 Approved by: https://github.com/jansel	2024-12-22 10:03:57 +00:00
HDCharles	374747818d	Run performance test non-alternately (#131935 ) Summary: By default, performance tests (speedup experiments) will run the baseline and test backend alternately. However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized. Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend). other changes: need to add torch.compiler.cudagraph_mark_step_begin() to avoid the slowdown from # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards also updated the torchao APIs to the current versions X-link: https://github.com/pytorch/benchmark/pull/2394 Test Plan: python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune (should all be ~1.0 0.997x 1.006x 0.994x Reviewed By: xuzhao9 Differential Revision: D60252821 Pulled By: HDCharles Pull Request resolved: https://github.com/pytorch/pytorch/pull/131935 Approved by: https://github.com/xuzhao9	2024-08-08 00:23:20 +00:00
Xu Zhao	cc9b005bf2	Enable torchao nightly workflow (#129779 ) Summary: Make the following improvements: * Schedule the torchao benchmark nightly * Enable torchbench, timm, and huggingface models * Refactor the benchmarking script to better arrange the benchmarking groups Test workflow: https://github.com/pytorch/benchmark/actions/runs/9705589352 X-link: https://github.com/pytorch/benchmark/pull/2336 Differential Revision: D59074571 Pulled By: xuzhao9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129779 Approved by: https://github.com/jerryzh168	2024-07-01 14:28:38 +00:00
Xu Zhao	1e818db547	[torchbench] Fix torchao benchmarking script (#126736 ) As the title says. Test Plan: ``` python benchmarks/dynamo/torchbench.py --only BERT_pytorch --bfloat16 --quantization int8dynamic --performance --inference --print-memory cuda eval BERT_pytorch [XZ Debug] Torch grad status: False memory: eager: 0.82 GB, dynamo: 0.92 GB, ratio: 0.89 running benchmark: 100% 1.001x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126736 Approved by: https://github.com/jerryzh168, https://github.com/huydhn	2024-05-21 23:15:12 +00:00

5 Commits