pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Nikolay Korovaiko	195ab5e864	remove non-default settings in fuser.py (#48862 ) Summary: I've noticed we are setting `_jit_set_num_profiled_runs` to 2 (which isn't our default) and sometimes we don't. We are also setting `_jit_set_bailout_depth` to 20 which is our default. I suggest we remove this logic altogether. I did a quick run to see if there's any impact and thankfully, the numbers seem to be consistent, but we should try avoding testing configurations that aren't default or aren't considered to become default. numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling non-defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.057 0.06287 None 7.322 0.07404 None aten 5.602 0.06303 None 13.64 0.4078 None jit 7.019 0.07995 None 13.77 0.554 None jit_premul 5.324 0.06203 None 12.01 0.2996 None jit_premul_bias 5.148 0.08061 None 11.62 0.4104 None jit_simple 6.69 0.2317 None 13.37 0.3791 None jit_multilayer 7.006 0.251 None 13.67 0.2239 None py 19.05 0.1119 None 28.28 0.6346 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.712 0.01628 None 19.93 0.03512 None resnet18_jit 8.688 0.01374 None 19.79 0.07518 None resnet50 31.04 0.08049 None 66.44 0.08187 None resnet50_jit 31.11 0.07171 None 66.45 0.09157 None ``` defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.086 0.115 None 7.394 0.1743 None aten 5.611 0.2559 None 13.54 0.387 None jit 7.062 0.3358 None 13.24 0.3688 None jit_premul 5.379 0.2086 None 11.57 0.3987 None jit_premul_bias 5.202 0.2127 None 11.13 0.06748 None jit_simple 6.648 0.05794 None 12.84 0.3047 None jit_multilayer 6.964 0.1104 None 13.24 0.3283 None py 19.14 0.09959 None 28.17 0.4946 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.713 0.01563 None 19.93 0.02759 None resnet18_jit 8.697 0.01792 None 19.78 0.06916 None resnet50 31.14 0.07431 None 66.57 0.07418 None resnet50_jit 31.21 0.0677 None 66.56 0.08655 None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48862 Reviewed By: bertmaher Differential Revision: D25342097 Pulled By: Krovatkin fbshipit-source-id: 8d2f72c2770793ec8cecee9dfab9aaaf2e1ad2b1	2020-12-05 20:58:39 -08:00
Mikhail Zolotukhin	bc5710f2f7	Benchmarks: tweak PE config settings. (#45349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45349 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23935518 Pulled By: ZolotukhinM fbshipit-source-id: 5a7c508c6fc84eafbc23399f095d732b903510dc	2020-09-26 23:13:29 -07:00
Mikhail Zolotukhin	8cef7326f4	Benchmarks: add 'default' options for fuser and executor. (#45347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45347 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D23935519 Pulled By: ZolotukhinM fbshipit-source-id: 8323fafe7828683c4d29c12a1e5722adb6f945ff	2020-09-26 23:09:02 -07:00
Mikhail Zolotukhin	d11603de38	[TensorExpr] Benchmarks: set number of profiling runs to 2 for PE. (#44112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44112 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23500904 Pulled By: ZolotukhinM fbshipit-source-id: d0dd54752b7ea5ae11f33e865c96d2d61e98d573	2020-09-03 11:29:35 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00

7 Commits