pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Wanchao Liang 28925902fa [TP] fully rewrite Tensor Parallel APIs (#114732 ) This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs supposed to be a very thin-wrapper to DTensor APIs, but the current implementation got too messy and buggy. It's really hard to debug what went wrong when using it. It's crucially important for advanced users or developers to understand the API and its implementation easily without going through all different types of functions and utils, so that they could trust what happen under the hood. In particular this PR: * Make ParallelStyle to be a real contract API for parallelize_module to take, each concrete ParallelStyle only needs to implement `apply` to apply the sharding to nn.Module, remove all non-necessary fields. This also enable easier ParallelStyle authoring going forward. * Keep the ColwiseParallel and RowwiseParallel public interface, but refactor them in a way that makes the parameter sharding, inputs and outputs handling lives within the style itself, so that it's easy to understand how Linear/Embedding layers are sharded and how the inputs/outputs transformations are performed. * remove all those private _prepare_input/_prepare_output_fn fields for both ColwiseParallel/RowwiseParallel. Since we throw deprecation messages in nightly for a while and TP is on prototype release, the fields are also private, it should be safe to remove them * Refactor the recently landed PrepareModuleInput/Output style, change output_layouts to desired_input/output_layouts, group the function inside the style itself, no default arguments for these two styles and user need to specify them to think about the sharding layouts. Fixed bugs about not handling `use_local_output` flag. * Make default arguments be None instead of Placement object, this is standard python practice to not have custom object instance as default argument * Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel style, all prepare input/output functions) as we throw deprecation msgs for a while, and in the progress of removing all of them from the tests. * throw deprecation warning for `tp_mesh_dim` as we recomemnd use device mesh slice/indexing instead of manually specify mesh dim * Rewrite all documentations for every ParallelStyle and make the documentation more clear about what each style is doing TODOs: * Rewrite TP tests to adjust for the changes we have in this PR * add more tests to guard the bug fixes Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732 Approved by: https://github.com/wz337, https://github.com/fduwjj		2023-12-02 08:18:12 +00:00
..
_static	Refactor torch.onnx documentation (#108379 )	2023-09-08 18:23:48 +00:00
_templates	Replace master with main in links and docs/conf.py (#100176 )	2023-05-02 18:20:32 +00:00
community	Add thiagocrepaldi as person of interest for onnx exporter (#113402 )	2023-11-10 15:19:58 +00:00
elastic	[BE] Prefer dash over underscore in command-line options (#94505 )	2023-02-09 20:16:49 +00:00
notes	[DDP][Compile] Test to Ensure torch.compile works w/static_graph=True (#114621 )	2023-12-01 22:18:45 +00:00
rpc
scripts	[export oncall] add some examples during oncall (#112445 )	2023-10-31 18:33:03 +00:00
amp.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
autograd.rst	Allow specifiying inputs as GradientEdge in autograd APIs (#110867 )	2023-10-12 04:08:44 +00:00
backends.rst	expose sdpa helpers to python (#110496 )	2023-11-15 07:34:34 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst	Add set_checkpoint_debug_enabled that overrides local setting (#110728 )	2023-10-11 02:12:31 +00:00
complex_numbers.rst	Update mentions of deprecated functions if complex_numbers.rst (#113391 )	2023-11-09 22:32:26 +00:00
cond.rst	[HigherOrderOp] expose torch.cond (#110293 )	2023-10-07 20:39:52 +00:00
conf.py	Canonicalize runtime asserts (#114509 )	2023-11-28 01:38:47 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst	Add current_device() to torch.cpu (#110987 )	2023-10-11 05:13:10 +00:00
cuda._sanitizer.rst
cuda.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst	Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131 )"	2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst	[DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721 )	2023-06-10 00:15:00 +00:00
deploy.rst
deterministic.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst	Stateful Checkpointing for Distributed [1/N] (#113867 )	2023-12-01 19:21:03 +00:00
distributed.elastic.rst
distributed.optim.rst
distributed.rst	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710 ) (#114991 )	2023-12-02 04:39:41 +00:00
distributed.tensor.parallel.rst	[TP] fully rewrite Tensor Parallel APIs (#114732 )	2023-12-02 08:18:12 +00:00
distributions.rst	Add inverse gamma distribution and fix `sign` bug in `PowerTransform`. (#104501 )	2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst	Revert "direct runtime assertions (#111262 )"	2023-10-17 08:04:36 +00:00
export.rst	[export] Clean up verifier [1/n]. (#112505 )	2023-11-02 19:36:06 +00:00
fft.rst
fsdp.rst	[FSDP][state_dict] Expose optimizer state_dict config (#105949 )	2023-08-21 07:29:49 +00:00
func.api.rst	[functorch] linearize (#94173 )	2023-02-09 15:45:08 +00:00
func.batch_norm.rst	Fix typo under docs directory (#97202 )	2023-03-21 01:24:10 +00:00
func.migrating.rst	[torch.func] Add migration guide from functorch (#91811 )	2023-01-17 22:14:42 +00:00
func.rst	Fix typo under docs directory (#92762 )	2023-01-23 18:07:22 +00:00
func.ux_limitations.rst	[torch.func] Add docs (#91319 )	2022-12-30 02:51:18 +00:00
func.whirlwind_tour.rst	[torch.func] Add docs (#91319 )	2022-12-30 02:51:18 +00:00
futures.rst
fx.rst	Split SymNode into its own file (#112037 )	2023-10-26 23:32:27 +00:00
hub.rst	Fix typo under docs directory (#92762 )	2023-01-23 18:07:22 +00:00
index.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst	Fix typo under docs directory (#97202 )	2023-03-21 01:24:10 +00:00
jit_language_reference.rst	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 )	2023-02-11 18:19:48 +00:00
jit_python_reference.rst
jit_unsupported.rst	Add support for `torch.Generator` type in TorchScript (#110413 )	2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
library.rst	Rewrite torch.library's documentation (#111310 )	2023-10-23 23:02:41 +00:00
linalg.rst
logging.rst	[Easy] log graphs in compiled_autograd if TORCH_LOGS=compiled_autograd (#108991 )	2023-09-12 00:15:02 +00:00
masked.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
math-quantizer-equation.png
mobile_optimizer.rst	[Reland] Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#92081 )	2023-01-14 17:06:00 +00:00
model_zoo.rst
monitor.rst
mps.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
multiprocessing.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
name_inference.rst	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 )	2023-08-29 22:13:33 +00:00
named_tensor.rst	fixing named tensor unflatten example (#106921 )	2023-08-22 18:00:10 +00:00
nested.rst	Replace master with main in links and docs/conf.py (#100176 )	2023-05-02 18:20:32 +00:00
nn.functional.rst	[SDPA] update type hint for scaled_dot_product_attention and documentation (#94008 )	2023-02-10 18:02:43 +00:00
nn.init.rst
nn.rst	[doc] Add nn.parametrizations.weight_norm (#113783 )	2023-11-16 17:42:48 +00:00
onnx_dynamo_onnxruntime_backend.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx_dynamo.rst	[ez][doc] Fix sample code in onnx_dynamo.rst (#114770 )	2023-11-29 19:27:52 +00:00
onnx_torchscript_supported_aten_ops.rst	Refactor torch.onnx documentation (#108379 )	2023-09-08 18:23:48 +00:00
onnx_torchscript.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
optim.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
package.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
pipeline.rst	docs: Linking ResNeXt PyTorch Hub Pipeline (#98689 )	2023-04-11 02:20:26 +00:00
profiler.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst	update quantization doc: add x86 backend as default backend of server inference (#86794 )	2022-12-02 02:10:25 +00:00
quantization-support.rst	[quant][pt2e] Add generate_numeric_debug_handle pass (#114315 )	2023-12-01 03:38:17 +00:00
quantization.rst	[ao] updating embedding_bag support for fx and eager (#107623 )	2023-11-21 03:54:00 +00:00
random.rst
rpc.rst	[BE] RPC is missing RRef docs (#106902 )	2023-08-10 16:26:27 +00:00
signal.rst	Nuttall window (#90103 )	2022-12-16 09:05:53 +00:00
sparse.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
special.rst
storage.rst
tensor_attributes.rst	Add a warning about performance cost of set_default_device (#92703 )	2023-01-21 02:23:13 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 )	2023-08-29 22:13:33 +00:00
testing.rst	document torch.testing.assert_allclose (#89526 )	2022-12-01 11:22:50 +00:00
torch_cuda_memory.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor.rst	[AOTInductor] Rename model_runner to model_container_runner (#111324 )	2023-11-16 19:14:22 +00:00
torch.compiler_api.rst	Add cudagraph_mark_step_begin in torch.compiler, reference in error message (#111722 )	2023-10-25 21:53:21 +00:00
torch.compiler_best_practices_for_backends.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_cudagraph_trees.rst	[doc] fix typo on graph 3 that is recorded (#114666 )	2023-11-28 20:40:13 +00:00
torch.compiler_custom_backends.rst	[dynamo, docs] update dynamo backend registration docs (#114820 )	2023-11-30 21:41:05 +00:00
torch.compiler_deepdive.rst	[Dynamo]Expose bytecode hooks and add example usage for decompilation in docs (#110714 )	2023-10-13 12:36:00 +00:00
torch.compiler_dynamic_shapes.rst	Update dynamic shapes documentation (#109764 )	2023-09-21 13:53:43 +00:00
torch.compiler_fake_tensor.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_faq.rst	Add docs for torch.compile(numpy) (#109710 )	2023-09-21 03:05:21 +00:00
torch.compiler_fine_grain_apis.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_get_started.rst	[Reland2] [inductor][BE] split triton_meta and inductor_meta (#112351 )	2023-11-02 00:40:12 +00:00
torch.compiler_guards_overview.rst	Do not use a specific LOC in link (#108957 )	2023-09-13 19:21:45 +00:00
torch.compiler_inductor_profiling.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_ir.rst	[export] torch.export landing page (#108783 )	2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 )	2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_profiling_torch_compile.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_transformations.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting.rst	Update `torch.compiler_troubleshooting.rst` (#114530 )	2023-11-25 23:15:47 +00:00
torch.compiler.rst	[docs] Fix `torch.compile` "tensorrt" backend docs (#113711 )	2023-11-15 08:42:53 +00:00
torch.overrides.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
torch.rst	Document torch.from_file and fix UntypedStorage.from_file docs (#111688 )	2023-10-25 19:28:11 +00:00
type_info.rst
utils.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00