pytorch/test/distributed/_tensor
Tristan Rice ddd0ed1b43 distributed: templated ring attention (#124215)
This adds a templated version of the ring attention forwards function as well as tests it with memory efficient attention. This doesn't add support for memory efficient attention in DTensor. That will be added in a follow up PR.

This templating is also a POC of how to support other attention ops such as Jagged/nested tensor and as well how to implement striped attention in a scalable way.

Misc changes:

* Fixes all_to_all_single autograd implementation with CUDA + adds NCCL test
* Adds compile support to the ring attention implementations (required some tweaks to process groups)

Test plan:

```
pytest test/distributed/_tensor/test_attention.py
pytest test/distributed/test_functional_api.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124215
Approved by: https://github.com/wanchaol
2024-04-19 00:57:08 +00:00
..
debug get CommsDebugMode to work with DTensor (#118769) 2024-02-29 01:11:05 +00:00
experimental [functional collective] change the Python APIs to only use the native funcol ops (#123777) 2024-04-13 03:08:36 +00:00
__init__.py
README.md
test_api.py nn.Module: use swap_tensors for Tensor subclasses (#122755) 2024-03-28 02:03:09 +00:00
test_attention.py distributed: templated ring attention (#124215) 2024-04-19 00:57:08 +00:00
test_common_rules.py [dtensor] refactor schema suggestions in output sharding (#122929) 2024-04-01 17:39:39 +00:00
test_convolution_ops.py [dtensor] support convolution ops (#113123) 2023-11-20 21:01:28 +00:00
test_dtensor_compile.py Revert "make sure dynamo doesn't inline DTensor __new__ or __torch_dispatch__ (#123347)" 2024-04-16 22:08:24 +00:00
test_dtensor_ops.py Fix index_reduce sampler filter when op_info.variant_test_name is specified (#123375) 2024-04-17 15:31:28 +00:00
test_dtensor.py [functional collective] change the Python APIs to only use the native funcol ops (#123777) 2024-04-13 03:08:36 +00:00
test_embedding_ops.py [dtensor] implement dim-0 (row) embedding sharding with MaskPartial (#118080) 2024-01-26 19:01:24 +00:00
test_experimental_ops.py [dtensor] support convolution ops (#113123) 2023-11-20 21:01:28 +00:00
test_init.py [DeviceMesh] Reuse sub_group pg if exists (#115716) 2024-01-25 18:07:16 +00:00
test_math_ops.py [dtensor][TP] check funcol calls and improve doc for loss parallel (#121366) 2024-03-08 01:41:31 +00:00
test_matrix_ops.py [functional collective] change the Python APIs to only use the native funcol ops (#123777) 2024-04-13 03:08:36 +00:00
test_op_strategy.py [dtensor] refactor sharding cost model to count for latency (#119897) 2024-02-15 00:35:56 +00:00
test_optimizers.py [DTensor] Enable ASGD foreach optimizer and add the associated unit test (#121942) 2024-03-15 20:21:27 +00:00
test_pointwise_ops.py [nit][DTensor][Test] Update test name to reflect the actual test (#118960) 2024-02-18 08:23:06 +00:00
test_random_ops.py [DTensor] Add rand_like, randn_like, randint_like ops to shard propagation (#112576) 2023-11-02 18:45:43 +00:00
test_redistribute.py [dtensor] move early return check into redistribute autograd function (#121653) 2024-03-12 17:37:30 +00:00
test_tensor_ops.py [dtensor] refactor and generalize stack strategy (#121869) 2024-03-15 00:34:25 +00:00
test_utils.py [DTensor][Test] Add unit tests to keep track of DTensor sharding for 2D (#123687) 2024-04-18 03:29:16 +00:00
test_view_ops.py [dtensor] add op support for view_as_complex and view_as_real (#122569) 2024-03-26 03:32:04 +00:00
test_xla_integration.py [DTensor][XLA] support XLA backend in distirbute_module API (#121355) 2024-03-08 15:47:33 +00:00

Run distributed tensor tests:

from root, run (either CPU or GPU)

pytest test/spmd/tensor/test_tensor.py

pytest test/spmd/tensor/test_ddp.py

run specific test case and print stdout/stderr:

pytest test/spmd/tensor/test_tensor.py -s -k test_tensor_from_local