pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Wanchao Liang a26480a4d1 [dtensor] move early return check into redistribute autograd function (#121653 ) This PR fixed the bug of redistribute to move early return check into the redistribute autograd function, so that even though we redistribute the same placement, the grad_placements from the `to_local` call might be different, the redistribute backward still need to happen Pull Request resolved: https://github.com/pytorch/pytorch/pull/121653 Approved by: https://github.com/awgu		2024-03-12 17:37:30 +00:00
..
debug	get CommsDebugMode to work with DTensor (#118769 )	2024-02-29 01:11:05 +00:00
experimental	[export] kill deprecated constraints API (#120860 )	2024-02-29 16:15:50 +00:00
__init__.py
README.md
test_api.py	[dtensor] change distribute_module input/output_fn to accept module (#120895 )	2024-03-04 07:22:32 +00:00
test_common_rules.py	[dtensor][7/n] remove reduction rule (#109144 )	2023-09-26 22:24:50 +00:00
test_convolution_ops.py	[dtensor] support convolution ops (#113123 )	2023-11-20 21:01:28 +00:00
test_dtensor_compile.py	Test parametrization utils for native funcol migration (#119950 )	2024-02-19 02:46:03 +00:00
test_dtensor_ops.py	Revert "Batch Norm Consolidation (#116092 )"	2024-03-11 22:22:41 +00:00
test_dtensor.py	[dtensor] add async_op option to redistribute and some refactor (#121477 )	2024-03-09 06:17:23 +00:00
test_embedding_ops.py	[dtensor] implement dim-0 (row) embedding sharding with MaskPartial (#118080 )	2024-01-26 19:01:24 +00:00
test_experimental_ops.py	[dtensor] support convolution ops (#113123 )	2023-11-20 21:01:28 +00:00
test_init.py	[DeviceMesh] Reuse sub_group pg if exists (#115716 )	2024-01-25 18:07:16 +00:00
test_math_ops.py	[dtensor][TP] check funcol calls and improve doc for loss parallel (#121366 )	2024-03-08 01:41:31 +00:00
test_matrix_ops.py	Change the .clone() in native funcol's all_reduce to use at::MemoryFormat::Contiguous (#120042 )	2024-02-22 20:24:15 +00:00
test_op_strategy.py	[dtensor] refactor sharding cost model to count for latency (#119897 )	2024-02-15 00:35:56 +00:00
test_optimizers.py	[dtensor] change distribute_module input/output_fn to accept module (#120895 )	2024-03-04 07:22:32 +00:00
test_pointwise_ops.py	[nit][DTensor][Test] Update test name to reflect the actual test (#118960 )	2024-02-18 08:23:06 +00:00
test_random_ops.py	[DTensor] Add rand_like, randn_like, randint_like ops to shard propagation (#112576 )	2023-11-02 18:45:43 +00:00
test_redistribute.py	[dtensor] move early return check into redistribute autograd function (#121653 )	2024-03-12 17:37:30 +00:00
test_tensor_ops.py	[dtensor] add op support for aten.gather.default (#118513 )	2024-02-02 01:48:21 +00:00
test_utils.py	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099 ) (#115193 )	2023-12-08 08:44:32 +00:00
test_view_ops.py	[dtensor] refactor some existing test util to use comm mode (#114404 )	2023-11-27 06:43:09 +00:00
test_xla_integration.py	[DTensor][XLA] support XLA backend in distirbute_module API (#121355 )	2024-03-08 15:47:33 +00:00

README.md

Run distributed tensor tests:

from root, run (either CPU or GPU)

pytest test/spmd/tensor/test_tensor.py

pytest test/spmd/tensor/test_ddp.py

run specific test case and print stdout/stderr:

pytest test/spmd/tensor/test_tensor.py -s -k test_tensor_from_local