mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
This is a followup to flash attention. On cuda, flash attention is supported only for fp16/bf16, whereas memory efficient attention is supported for fp32 (but not fp64). With this PR, one can run SDPA and in general Transformer completely in dtensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122996 Approved by: https://github.com/XilunWu, https://github.com/wanchaol ghstack dependencies: #122995 |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| basic_strategy.py | ||
| common_rules.py | ||
| conv_ops.py | ||
| embedding_ops.py | ||
| experimental_ops.py | ||
| math_ops.py | ||
| matrix_ops.py | ||
| pointwise_ops.py | ||
| random_ops.py | ||
| tensor_ops.py | ||
| utils.py | ||
| view_ops.py | ||