mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-08 07:39:33 +01:00
as titled, given that our DTensorSpec is immutable, we can always reuse the spec if the input/output have the same tensor metadata. this helps two fold: 1. We don't need to re-calculate the hash everytime we produce a DTensorSpec, reduce runtime operator overhead 2. reduce the DTensor construction overhead. Some local benchmark on a 800 parameter clip_grad_norm shows that for foreach_norm the CPU overhead reduces from 11ms -> 7.8ms (around 30% improvement) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128112 Approved by: https://github.com/awgu |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| _data_parallel_utils.py | ||
| _utils.py | ||
| api.py | ||
| ddp.py | ||
| fsdp.py | ||
| input_reshard.py | ||
| loss.py | ||
| style.py | ||