pytorch/torch
Tianyu Liu 777eca9f16 [DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP (#157216)
This is to unblock "dp2ep" Expert Parallel + TP integration in torchtitan https://github.com/pytorch/torchtitan/pull/1324.

It does two things:
1. Slightly modifies the glue code for FSDP/HSDP + TP to work with FSDP/HSDP + EP and FSDP/HSDP + EP + TP. I kept the name `FSDPParam._tp_spec` to make the change minimal. We can consider renaming it in the future if it confuses people, but I heard @wanchaol has a plan to rewrite DTensor strided sharding entirely.
2. Lifts the check of `_validate_tp_mesh_dim` for `torch.distributed.tensor.parallel.parallelize_module`, as in EP or EP+TP this check is too strict. In particular it assumes a DeviceMesh must have `mesh_dim_names` which is not always true. I'm also removing the file `torch/distributed/tensor/parallel/_utils.py` it belongs entirely, as the other check `_deprecate_warnings`, added two years ago, is not used any more.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157216
Approved by: https://github.com/wanchaol, https://github.com/weifengpy
2025-07-08 15:57:37 +00:00
..
_awaits
_C Add device_id to XPU device properties (#156481) 2025-07-03 01:22:11 +00:00
_C_flatbuffer
_custom_op pyfmt lint torch/_custom_op/* (#155782) 2025-06-12 23:04:11 +00:00
_decomp Revert "Fix full_like decomposition to preserve strides (#144765)" 2025-07-02 13:56:03 +00:00
_dispatch Improve torch.ops typing (#154555) 2025-06-22 15:52:27 +00:00
_dynamo [BE] Do not add . after troubleshooting_url (#157753) 2025-07-08 15:38:24 +00:00
_export Remove is_jit_trace option (#157387) 2025-07-03 09:20:27 +00:00
_functorch Automatically load and save dynamo entries via caching_precompile (#155913) 2025-07-07 23:57:17 +00:00
_higher_order_ops [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
_inductor [PT2][memory] mutation size correctness (#157562) 2025-07-08 14:02:20 +00:00
_lazy
_library [BE][PYFMT] migrate PYFMT for torch/_[a-h]*/ to ruff format (#144551) 2025-06-25 06:16:06 +00:00
_logging [BE][PYFMT] migrate PYFMT for torch/_[a-h]*/ to ruff format (#144551) 2025-06-25 06:16:06 +00:00
_numpy [BE][PYFMT] migrate PYFMT for torch/_[a-h]*/ to ruff format (#144551) 2025-06-25 06:16:06 +00:00
_prims [remove untyped defs] batch 1 (#157011) 2025-06-30 23:54:40 +00:00
_prims_common python definitely_contiguous-> is_contiguous_or_false (#156515) 2025-06-30 17:31:51 +00:00
_refs _broadcast_shapes gso generalizations (#157008) 2025-07-04 05:56:42 +00:00
_strobelight [BE][PYFMT] migrate PYFMT for torch/_[a-h]*/ to ruff format (#144551) 2025-06-25 06:16:06 +00:00
_subclasses [fake tensor] fix issue of no attribute tags (#156689) 2025-07-03 01:16:01 +00:00
_vendor
accelerator Revert "Add unified memory APIs for torch.accelerator (#152932)" 2025-06-25 00:11:35 +00:00
amp [BE][PYFMT] migrate PYFMT for torch/[a-c]*/ to ruff format (#144554) 2025-07-03 18:56:07 +00:00
ao remove allow-untyped-defs from torch/ao/nn/quantized/modules/rnn.py (#157234) 2025-07-08 00:11:52 +00:00
autograd [BE][PYFMT] migrate PYFMT for torch/[a-c]*/ to ruff format (#144554) 2025-07-03 18:56:07 +00:00
backends remove allow-untyped-defs from torch/backends/cusparselt/__init__.py (#157232) 2025-07-08 00:11:52 +00:00
compiler Automatically load and save dynamo entries via caching_precompile (#155913) 2025-07-07 23:57:17 +00:00
contrib
cpu
csrc fix storage use_count (#157694) 2025-07-08 05:53:12 +00:00
cuda [BE][PYFMT] migrate PYFMT for torch/[a-c]*/ to ruff format (#144554) 2025-07-03 18:56:07 +00:00
distributed [DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP (#157216) 2025-07-08 15:57:37 +00:00
distributions Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845) 2025-06-24 15:41:34 +00:00
export Remove is_jit_trace option (#157387) 2025-07-03 09:20:27 +00:00
fft [BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553) 2025-06-17 08:18:47 +00:00
func
futures Simplify the base classes of _PyFutureMeta (#157757) 2025-07-08 15:39:56 +00:00
fx Add flag to fx.passes.split_module to normalize input names (#157733) 2025-07-08 13:47:24 +00:00
headeronly Reapply D77381084 / #156964: Rename torch::standalone to headeronly (#157251) 2025-06-30 23:25:30 +00:00
jit [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
legacy
lib [2/N] Fix cppcoreguidelines-init-variables suppression (#146237) 2025-06-19 23:26:42 +00:00
linalg Fix for ambiguity in linalg.norm()'s ord argument of +2 & -2 (#155148) 2025-06-04 21:15:20 +00:00
masked [BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553) 2025-06-17 08:18:47 +00:00
monitor
mps [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
mtia [BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553) 2025-06-17 08:18:47 +00:00
multiprocessing [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
nativert [nativert] Move Executor to PyTorch core (#157514) 2025-07-03 23:31:54 +00:00
nested [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
nn XCCL changes for DDP (#155497) 2025-07-03 05:18:08 +00:00
onnx [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
optim swa avoid stream sync (#157705) 2025-07-07 20:47:35 +00:00
package [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
profiler [profiler] add more CUDA API for kernel launcher (#156016) 2025-07-03 15:26:42 +00:00
quantization [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
signal [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
sparse [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
special [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
testing Add max_pool3d backward pass for MPS (#157498) 2025-07-07 19:46:44 +00:00
utils Deprecate DataLoader pin_memory_device param (#146821) 2025-07-08 09:24:53 +00:00
xpu [BE][6/16] fix typos in torch/ (#156316) 2025-06-23 02:57:34 +00:00
__config__.py
__future__.py
__init__.py correctly import torch.version (#157584) 2025-07-07 21:43:35 +00:00
_appdirs.py
_classes.py remove allow-untyped-defs from torch/_classes.py (#157231) 2025-07-08 00:11:52 +00:00
_compile.py [precompile] Ensure @disable()-ed function won't trigger recompile from precompile bytecode. (#155363) 2025-06-10 16:13:38 +00:00
_custom_ops.py
_deploy.py
_environment.py
_guards.py Revert "[dynamo][fsdp] Consistent behavior of int attributes (#157262)" 2025-07-02 08:30:39 +00:00
_jit_internal.py BE: Type previously untyped decorators (#154515) 2025-05-29 00:36:34 +00:00
_linalg_utils.py
_lobpcg.py Improve documentation for torch.lobpcg (#156139) 2025-06-25 00:39:34 +00:00
_lowrank.py
_meta_registrations.py python definitely_contiguous-> is_contiguous_or_false (#156515) 2025-06-30 17:31:51 +00:00
_namedtensor_internals.py
_ops.py Improve torch.ops typing (#154555) 2025-06-22 15:52:27 +00:00
_python_dispatcher.py Typo fixes for "overridden" in comments and function names (#155944) 2025-06-14 03:37:38 +00:00
_size_docs.py
_sources.py
_storage_docs.py Fix docstring for torch.UntypedStorage.from_file (#155067) 2025-06-05 14:30:49 +00:00
_streambase.py
_tensor_docs.py [docs] Add docstring indicating UB for converting inf to int (#154781) 2025-06-10 14:04:50 +00:00
_tensor_str.py fix tensor print behavior for MAIA (#155609) 2025-06-14 01:04:12 +00:00
_tensor.py Upgrade to DLPack 1.0. (#145000) 2025-06-30 16:58:06 +00:00
_thread_safe_fork.py
_torch_docs.py Documentation update torch.clone #156644 (#157007) 2025-06-27 21:10:09 +00:00
_utils_internal.py Inductor logging + analysis of torch.profile (#149697) 2025-07-07 22:13:34 +00:00
_utils.py Disable pinning check when loading sparse tensors (#154638) 2025-06-18 14:33:36 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py
CMakeLists.txt [1/N] Don't use CUDA.cmake module (#157188) 2025-07-08 03:05:35 +00:00
custom_class_detail.h
custom_class.h
extension.h
functional.py
header_only_apis.txt Reapply D77381084 / #156964: Rename torch::standalone to headeronly (#157251) 2025-06-30 23:25:30 +00:00
hub.py
library.h
library.py
overrides.py Revert "Fused RMSNorm implementation (#153666)" 2025-07-01 18:46:45 +00:00
py.typed
quasirandom.py
random.py
return_types.py
script.h
serialization.py [BE] Add missing type for storage dict (#156831) 2025-06-26 02:48:55 +00:00
storage.py mypy 1.16.0 (#155821) 2025-06-14 18:18:43 +00:00
torch_version.py
types.py
version.py.tpl