mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99 |
||
|---|---|---|
| .. | ||
| _multi_tensor | ||
| __init__.py | ||
| __init__.pyi | ||
| _functional.py | ||
| adadelta.py | ||
| adagrad.py | ||
| adam.py | ||
| adam.pyi | ||
| adamax.py | ||
| adamax.pyi | ||
| adamw.py | ||
| adamw.pyi | ||
| asgd.py | ||
| asgd.pyi | ||
| lbfgs.py | ||
| lbfgs.pyi | ||
| lr_scheduler.py | ||
| nadam.py | ||
| nadam.pyi | ||
| optimizer.py | ||
| radam.py | ||
| radam.pyi | ||
| rmsprop.py | ||
| rmsprop.pyi | ||
| rprop.py | ||
| rprop.pyi | ||
| sgd.py | ||
| sgd.pyi | ||
| sparse_adam.py | ||
| sparse_adam.pyi | ||
| swa_utils.py | ||