pytorch/torch/distributed/_tensor/ops
Tianyu Liu efece3f142 [dtensor] add op support for memory efficient attention (#122996)
This is a followup to flash attention. On cuda, flash attention is supported only for fp16/bf16, whereas memory efficient attention is supported for fp32 (but not fp64). With this PR, one can run SDPA and in general Transformer completely in dtensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122996
Approved by: https://github.com/XilunWu, https://github.com/wanchaol
ghstack dependencies: #122995
2024-05-08 17:08:27 +00:00
..
__init__.py [dtensor] support convolution ops (#113123) 2023-11-20 21:01:28 +00:00
basic_strategy.py [BE]: FURB142 - Remove set mutations. Use set update (#124551) 2024-04-21 14:12:33 +00:00
common_rules.py [dtensor] refactor schema suggestions in output sharding (#122929) 2024-04-01 17:39:39 +00:00
conv_ops.py [dtensor] support convolution ops (#113123) 2023-11-20 21:01:28 +00:00
embedding_ops.py [dtensor] implement shard dim change with alltoall (#124872) 2024-04-30 18:30:34 +00:00
experimental_ops.py Remove hard numpy dependency from experimental_ops.py (#119520) 2024-02-27 02:46:13 +00:00
math_ops.py [dtensor] use str for reduce_op (#125172) 2024-04-29 23:30:24 +00:00
matrix_ops.py [dtensor] add op support for memory efficient attention (#122996) 2024-05-08 17:08:27 +00:00
pointwise_ops.py DTensor Fused ADAM (#125369) 2024-05-07 00:08:09 +00:00
random_ops.py [DTensor][BE] rename PlacementStrategy.output_spec to output_specs since now we support a tuple of DTensorSpec as output (#116437) 2024-01-24 03:33:58 +00:00
tensor_ops.py [dtensor] improve new factory strategy (#122995) 2024-05-08 17:05:07 +00:00
utils.py DTensor: use memory_format in the hash for all aten ops that use that arg (e.g. aten.clone) (#118667) 2024-02-20 15:23:48 +00:00
view_ops.py [dtensor] refactor schema suggestions in output sharding (#122929) 2024-04-01 17:39:39 +00:00