pytorch/torch/distributed/optim
haozhe.zhu 1c3fe84033 [optim] add fused_adagrad support for CPU device (#124905)
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c
https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_adagrad time: 0.2500 seconds
_fused_adagrad time: 0.0933 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_adagrad time: 2.8819 seconds
_fused_adagrad time: 1.7591 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905
Approved by: https://github.com/jgong5, https://github.com/janeyx99
2024-05-13 01:16:20 +00:00
..
__init__.py Disable dynamo on functional optims if capturable=False (#123619) 2024-05-07 22:17:01 +00:00
apply_optimizer_in_backward.py
functional_adadelta.py
functional_adagrad.py [optim] add fused_adagrad support for CPU device (#124905) 2024-05-13 01:16:20 +00:00
functional_adam.py
functional_adamax.py
functional_adamw.py
functional_rmsprop.py Add tensor step and capturable support to rmsprop (#122264) 2024-03-28 03:39:28 +00:00
functional_rprop.py Add tensor step and capturable support to rprop (#122261) 2024-03-28 23:31:18 +00:00
functional_sgd.py
named_optimizer.py
optimizer.py
post_localSGD_optimizer.py
utils.py
zero_redundancy_optimizer.py
zero_redundancy_optimizer.pyi