pytorch/torch
Ke Wen daed3bf8f9 Implement coalesced all_gather_into_tensor (#101157)
This PR adds support for the following use cases:
- Sync style:
```
with dist._coalescing_manager():
     for i in range(num_coll):
         dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])
```
- Async style:
```
with dist._coalescing_manager(async_ops=True) as cm:
     for i in range(num_coll):
         dist.all_gather_into_tensor(output_tensors[i], input_tensors[i])

# do a bunch of other things
cm.wait()
# do things that depend on the all-gather's
```
Each `all_gather_into_tensor` would be independent in terms of data and their buffer location. But could be executed in parallel by supported backends (like NCCL).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101157
Approved by: https://github.com/kumpera, https://github.com/wanchaol
2023-05-11 20:58:47 +00:00
..
_awaits
_C Revert "[MPS] Add support for Custom Kernels (#100661)" 2023-05-09 17:02:04 +00:00
_C_flatbuffer
_decomp [decomp] Bad accuracy for elu_backward (#100284) 2023-04-29 04:21:20 +00:00
_dispatch
_dynamo [pt2] Skip if curr_size is None (#101170) 2023-05-11 17:20:38 +00:00
_export [export] More robust view->view_copy pass (#100908) 2023-05-10 14:25:17 +00:00
_functorch Turn on anomaly detection for AOTAutograd backward tracing (#101047) 2023-05-11 03:38:20 +00:00
_higher_order_ops [Reland] Initial version of Dynamo capture for HigherOrderOperator (#100544) 2023-05-03 20:49:05 +00:00
_inductor Remove obsolete upsample_bilinear2d lowerings (#101111) 2023-05-11 20:41:57 +00:00
_lazy
_logging Expose function to retrieve list of registered loggers (#100776) 2023-05-06 04:22:28 +00:00
_prims [pt2] enable svd in fake_tensor (#100130) 2023-05-05 06:27:59 +00:00
_prims_common [pt2] add meta function for solve_triangular (#100829) 2023-05-08 13:48:15 +00:00
_refs [opinfo] item (#100313) 2023-05-10 11:32:45 +00:00
_subclasses Reduce fake_tensor create_mode logging (#101074) 2023-05-11 13:26:38 +00:00
amp refactor macro with AMP (#99285) 2023-04-19 01:00:00 +00:00
ao [Quant][PT2E]Fix pt2e quantization maxpool input observer issue (#100961) 2023-05-11 06:14:34 +00:00
autograd fix(docs): torch.autograd.graph.Node.register_hook can override grad_inputs, not grad_outputs (#100272) 2023-04-29 00:10:12 +00:00
backends Publicly exposing torch.backends.cpu.get_cpu_capability() (#100164) 2023-05-03 19:02:07 +00:00
contrib
cpu
csrc Implement coalesced all_gather_into_tensor (#101157) 2023-05-11 20:58:47 +00:00
cuda [BE] Enable C419 rule for any all shortcircuiting (#99890) 2023-04-25 15:02:13 +00:00
distributed Implement coalesced all_gather_into_tensor (#101157) 2023-05-11 20:58:47 +00:00
distributions Remove in-place operations in NegativeBinomial (#96748) 2023-04-26 14:45:08 +00:00
fft
func
futures
fx Apply static policy correctly to unspec (#98983) 2023-05-10 05:59:12 +00:00
jit Register get_cpu_capability for jit (#100723) 2023-05-09 09:52:29 +00:00
legacy
lib
linalg
masked [BE] Enable flake8-comprehension rule C417 (#97880) 2023-03-30 14:34:24 +00:00
monitor
mps Revert "[MPS] Add support for Custom Kernels (#100661)" 2023-05-09 17:02:04 +00:00
multiprocessing Reduce overhead in CUDAGraph Trees (#98529) 2023-04-07 05:46:08 +00:00
nested [BE] Enable C419 rule for any all shortcircuiting (#99890) 2023-04-25 15:02:13 +00:00
nn [BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715) 2023-05-09 17:28:48 +00:00
onnx [ONNX] Refactor Input/Output Adapter (#100490) 2023-05-06 16:01:49 +00:00
optim [adam] Use the right params in weight_decay, rename for clarity, fixes #100707 (#100973) 2023-05-09 17:00:27 +00:00
package Convert logging f-strings to use % format, part five (#98765) 2023-04-11 13:17:59 +00:00
profiler [profiler] provide torch.profiler._utils._init_for_cuda_graphs() as a workaround (#100441) 2023-05-05 19:25:37 +00:00
quantization
signal Fix flake8 lint errors reported by ruff - take 2 (#99798) 2023-04-23 23:09:51 +00:00
sparse bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882) 2023-05-11 11:22:55 +00:00
special
testing [BE] Testing docs: clarify test instantiation function usage (#100905) 2023-05-11 20:48:03 +00:00
utils [DataPipe] Add generated docstring to functional form DataPipe (#100503) 2023-05-10 14:06:46 +00:00
__config__.py
__future__.py
__init__.py Publicly exposing torch.backends.cpu.get_cpu_capability() (#100164) 2023-05-03 19:02:07 +00:00
_appdirs.py
_classes.py
_custom_op.py Add load_storage (#100519) 2023-05-05 05:25:03 +00:00
_deploy.py
_guards.py Move tracked nn_modules from OutputGraph to TracingContext (#100457) 2023-05-03 02:00:11 +00:00
_jit_internal.py [JIT] Allow tuple and list generics (#98703) 2023-04-09 22:58:58 +00:00
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py simplify sdpa backward meta registration (#101128) 2023-05-11 03:30:07 +00:00
_namedtensor_internals.py
_ops.py Revert "Initial version of Dynamo capture for HigherOrderOperator (#99988)" 2023-05-03 14:02:40 +00:00
_python_dispatcher.py
_sources.py
_storage_docs.py
_tensor_docs.py Fix Tensor.uniform_ documentation to mention generator argument (#99510) 2023-04-19 19:23:12 +00:00
_tensor_str.py Fix FakeTensor printing (#99205) 2023-04-18 13:26:27 +00:00
_tensor.py Change 'w.r.t.' to 'wrt' in function docstrings to fix doc rendering (#100028) 2023-04-25 23:53:26 +00:00
_torch_docs.py Modify repeat_interleave docs to highlight potential overloading (#99650) 2023-05-01 17:53:03 +00:00
_utils_internal.py Log PT2 compile to Scuba (#98790) 2023-04-11 20:10:35 +00:00
_utils.py add get_device_index for custom device (#98804) 2023-04-12 23:58:31 +00:00
_VF.py
_vmap_internals.py [BE] Enable C419 rule for any all shortcircuiting (#99890) 2023-04-25 15:02:13 +00:00
_weights_only_unpickler.py
abi-check.cpp
CMakeLists.txt
custom_class_detail.h
custom_class.h
extension.h
functional.py STFT: correct stft definition and better document tensor shapes (#100427) 2023-05-10 01:42:01 +00:00
hub.py Add --offload-to-disk support to minifier (#100546) 2023-05-05 05:25:03 +00:00
library.h
library.py torch.library.Library.impl: add missing param in docstring example (#98619) 2023-04-11 06:09:46 +00:00
overrides.py Persist torch.assert in aten graph (#100101) 2023-04-28 07:31:43 +00:00
py.typed
quasirandom.py
random.py add rng_state support for custom device (#98069) 2023-04-10 22:36:55 +00:00
README.txt
return_types.py
script.h
serialization.py fix _privateuse1_tag problem (#100632) 2023-05-10 09:53:19 +00:00
storage.py Fix loading data on different encoding (#94503) 2023-04-25 21:05:20 +00:00
torch_version.py
types.py

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.