pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Oguz Ulgen	dc55704b48	Rename cache limit to recompile limit in configs (#143709 ) This PR renames every cache_limit to recompile_limit via sed. Old config options are maintained via Config(alias='xyz') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709 Approved by: https://github.com/jansel	2024-12-22 10:03:57 +00:00
HDCharles	374747818d	Run performance test non-alternately (#131935 ) Summary: By default, performance tests (speedup experiments) will run the baseline and test backend alternately. However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized. Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend). other changes: need to add torch.compiler.cudagraph_mark_step_begin() to avoid the slowdown from # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards also updated the torchao APIs to the current versions X-link: https://github.com/pytorch/benchmark/pull/2394 Test Plan: python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune (should all be ~1.0 0.997x 1.006x 0.994x Reviewed By: xuzhao9 Differential Revision: D60252821 Pulled By: HDCharles Pull Request resolved: https://github.com/pytorch/pytorch/pull/131935 Approved by: https://github.com/xuzhao9	2024-08-08 00:23:20 +00:00
Xu Zhao	cc9b005bf2	Enable torchao nightly workflow (#129779 ) Summary: Make the following improvements: * Schedule the torchao benchmark nightly * Enable torchbench, timm, and huggingface models * Refactor the benchmarking script to better arrange the benchmarking groups Test workflow: https://github.com/pytorch/benchmark/actions/runs/9705589352 X-link: https://github.com/pytorch/benchmark/pull/2336 Differential Revision: D59074571 Pulled By: xuzhao9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129779 Approved by: https://github.com/jerryzh168	2024-07-01 14:28:38 +00:00
Xu Zhao	1e818db547	[torchbench] Fix torchao benchmarking script (#126736 ) As the title says. Test Plan: ``` python benchmarks/dynamo/torchbench.py --only BERT_pytorch --bfloat16 --quantization int8dynamic --performance --inference --print-memory cuda eval BERT_pytorch [XZ Debug] Torch grad status: False memory: eager: 0.82 GB, dynamo: 0.92 GB, ratio: 0.89 running benchmark: 100% 1.001x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126736 Approved by: https://github.com/jerryzh168, https://github.com/huydhn	2024-05-21 23:15:12 +00:00

4 Commits