mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary:
### 🚀 The feature, motivation and pitch
Following the discussion in https://github.com/pytorch/pytorch/issues/65813, I added the QR factorization to powerSGD_hook.py
Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3.
This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods:

### Alternatives
Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_.
### Additional context
_No response_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72043
Reviewed By: albanD
Differential Revision: D34042781
Pulled By: cbalioglu
fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462
(cherry picked from commit
|
||
|---|---|---|
| .. | ||
| _checkpoint | ||
| _optimizer_overlap | ||
| ddp_comm_hooks | ||
| model_averaging | ||
| quantization | ||
| __init__.py | ||
| join.py | ||