pytorch/torch/distributed/tensor/parallel
Wanchao Liang 4f87f47ea1 [dtensor] reuse DTensorSpec as much as possible (#128112)
as titled, given that our DTensorSpec is immutable, we can always reuse
the spec if the input/output have the same tensor metadata. this helps two fold:
1. We don't need to re-calculate the hash everytime we produce a
   DTensorSpec, reduce runtime operator overhead
2. reduce the DTensor construction overhead.

Some local benchmark on a 800 parameter clip_grad_norm shows that for
foreach_norm the CPU overhead reduces from 11ms -> 7.8ms (around 30% improvement)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128112
Approved by: https://github.com/awgu
2024-06-06 16:55:50 +00:00
..
__init__.py [TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295) 2024-03-07 02:04:59 +00:00
_data_parallel_utils.py [reland] pass shape/stride during tensor unflatten (#117340) 2024-01-13 19:33:47 +00:00
_utils.py [BE] enable ruff rule Q from flake8-quotes (#127713) 2024-06-02 23:25:26 +00:00
api.py [TP] Add wildcard support (#122968) 2024-04-02 21:23:39 +00:00
ddp.py
fsdp.py [FSDP1][2D] Fix FSDP1 2D state_dict to use run_check=False (#123802) 2024-04-24 01:25:11 +00:00
input_reshard.py
loss.py [dtensor] reuse DTensorSpec as much as possible (#128112) 2024-06-06 16:55:50 +00:00
style.py [tp] add kwargs support to prepare_module_input (#124114) 2024-04-22 21:46:31 +00:00