pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Ke Wen daed3bf8f9 Implement coalesced all_gather_into_tensor (#101157 ) This PR adds support for the following use cases: - Sync style: ``` with dist._coalescing_manager(): for i in range(num_coll): dist.all_gather_into_tensor(output_tensors[i], input_tensors[i]) ``` - Async style: ``` with dist._coalescing_manager(async_ops=True) as cm: for i in range(num_coll): dist.all_gather_into_tensor(output_tensors[i], input_tensors[i]) # do a bunch of other things cm.wait() # do things that depend on the all-gather's ``` Each `all_gather_into_tensor` would be independent in terms of data and their buffer location. But could be executed in parallel by supported backends (like NCCL). Pull Request resolved: https://github.com/pytorch/pytorch/pull/101157 Approved by: https://github.com/kumpera, https://github.com/wanchaol		2023-05-11 20:58:47 +00:00
..
_awaits
_C	Revert "[MPS] Add support for Custom Kernels (#100661 )"	2023-05-09 17:02:04 +00:00
_C_flatbuffer
_decomp	[decomp] Bad accuracy for elu_backward (#100284 )	2023-04-29 04:21:20 +00:00
_dispatch
_dynamo	[pt2] Skip if curr_size is None (#101170 )	2023-05-11 17:20:38 +00:00
_export	[export] More robust view->view_copy pass (#100908 )	2023-05-10 14:25:17 +00:00
_functorch	Turn on anomaly detection for AOTAutograd backward tracing (#101047 )	2023-05-11 03:38:20 +00:00
_higher_order_ops	[Reland] Initial version of Dynamo capture for HigherOrderOperator (#100544 )	2023-05-03 20:49:05 +00:00
_inductor	Remove obsolete upsample_bilinear2d lowerings (#101111 )	2023-05-11 20:41:57 +00:00
_lazy
_logging	Expose function to retrieve list of registered loggers (#100776 )	2023-05-06 04:22:28 +00:00
_prims	[pt2] enable `svd` in `fake_tensor` (#100130 )	2023-05-05 06:27:59 +00:00
_prims_common	[pt2] add meta function for `solve_triangular` (#100829 )	2023-05-08 13:48:15 +00:00
_refs	[opinfo] item (#100313 )	2023-05-10 11:32:45 +00:00
_subclasses	Reduce fake_tensor create_mode logging (#101074 )	2023-05-11 13:26:38 +00:00
amp	refactor macro with AMP (#99285 )	2023-04-19 01:00:00 +00:00
ao	[Quant][PT2E]Fix pt2e quantization maxpool input observer issue (#100961 )	2023-05-11 06:14:34 +00:00
autograd	fix(docs): torch.autograd.graph.Node.register_hook can override grad_inputs, not grad_outputs (#100272 )	2023-04-29 00:10:12 +00:00
backends	Publicly exposing `torch.backends.cpu.get_cpu_capability()` (#100164 )	2023-05-03 19:02:07 +00:00
contrib
cpu
csrc	Implement coalesced all_gather_into_tensor (#101157 )	2023-05-11 20:58:47 +00:00
cuda	[BE] Enable C419 rule for any all shortcircuiting (#99890 )	2023-04-25 15:02:13 +00:00
distributed	Implement coalesced all_gather_into_tensor (#101157 )	2023-05-11 20:58:47 +00:00
distributions	Remove in-place operations in NegativeBinomial (#96748 )	2023-04-26 14:45:08 +00:00
fft
func
futures
fx	Apply static policy correctly to unspec (#98983 )	2023-05-10 05:59:12 +00:00
jit	Register get_cpu_capability for jit (#100723 )	2023-05-09 09:52:29 +00:00
legacy
lib
linalg
masked	[BE] Enable flake8-comprehension rule C417 (#97880 )	2023-03-30 14:34:24 +00:00
monitor
mps	Revert "[MPS] Add support for Custom Kernels (#100661 )"	2023-05-09 17:02:04 +00:00
multiprocessing	Reduce overhead in CUDAGraph Trees (#98529 )	2023-04-07 05:46:08 +00:00
nested	[BE] Enable C419 rule for any all shortcircuiting (#99890 )	2023-04-25 15:02:13 +00:00
nn	[BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715 )	2023-05-09 17:28:48 +00:00
onnx	[ONNX] Refactor Input/Output Adapter (#100490 )	2023-05-06 16:01:49 +00:00
optim	[adam] Use the right params in weight_decay, rename for clarity, fixes #100707 (#100973 )	2023-05-09 17:00:27 +00:00
package	Convert logging f-strings to use % format, part five (#98765 )	2023-04-11 13:17:59 +00:00
profiler	[profiler] provide torch.profiler._utils._init_for_cuda_graphs() as a workaround (#100441 )	2023-05-05 19:25:37 +00:00
quantization
signal	Fix flake8 lint errors reported by ruff - take 2 (#99798 )	2023-04-23 23:09:51 +00:00
sparse	bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882 )	2023-05-11 11:22:55 +00:00
special
testing	[BE] Testing docs: clarify test instantiation function usage (#100905 )	2023-05-11 20:48:03 +00:00
utils	[DataPipe] Add generated docstring to functional form DataPipe (#100503 )	2023-05-10 14:06:46 +00:00
__config__.py
__future__.py
__init__.py	Publicly exposing `torch.backends.cpu.get_cpu_capability()` (#100164 )	2023-05-03 19:02:07 +00:00
_appdirs.py
_classes.py
_custom_op.py	Add load_storage (#100519 )	2023-05-05 05:25:03 +00:00
_deploy.py
_guards.py	Move tracked nn_modules from OutputGraph to TracingContext (#100457 )	2023-05-03 02:00:11 +00:00
_jit_internal.py	[JIT] Allow `tuple` and `list` generics (#98703 )	2023-04-09 22:58:58 +00:00
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py	simplify sdpa backward meta registration (#101128 )	2023-05-11 03:30:07 +00:00
_namedtensor_internals.py
_ops.py	Revert "Initial version of Dynamo capture for HigherOrderOperator (#99988 )"	2023-05-03 14:02:40 +00:00
_python_dispatcher.py
_sources.py
_storage_docs.py
_tensor_docs.py	Fix Tensor.uniform_ documentation to mention generator argument (#99510 )	2023-04-19 19:23:12 +00:00
_tensor_str.py	Fix FakeTensor printing (#99205 )	2023-04-18 13:26:27 +00:00
_tensor.py	Change 'w.r.t.' to 'wrt' in function docstrings to fix doc rendering (#100028 )	2023-04-25 23:53:26 +00:00
_torch_docs.py	Modify repeat_interleave docs to highlight potential overloading (#99650 )	2023-05-01 17:53:03 +00:00
_utils_internal.py	Log PT2 compile to Scuba (#98790 )	2023-04-11 20:10:35 +00:00
_utils.py	add get_device_index for custom device (#98804 )	2023-04-12 23:58:31 +00:00
_VF.py
_vmap_internals.py	[BE] Enable C419 rule for any all shortcircuiting (#99890 )	2023-04-25 15:02:13 +00:00
_weights_only_unpickler.py
abi-check.cpp
CMakeLists.txt
custom_class_detail.h
custom_class.h
extension.h
functional.py	STFT: correct stft definition and better document tensor shapes (#100427 )	2023-05-10 01:42:01 +00:00
hub.py	Add --offload-to-disk support to minifier (#100546 )	2023-05-05 05:25:03 +00:00
library.h
library.py	torch.library.Library.impl: add missing param in docstring example (#98619 )	2023-04-11 06:09:46 +00:00
overrides.py	Persist torch.assert in aten graph (#100101 )	2023-04-28 07:31:43 +00:00
py.typed
quasirandom.py
random.py	add rng_state support for custom device (#98069 )	2023-04-10 22:36:55 +00:00
README.txt
return_types.py
script.h
serialization.py	fix _privateuse1_tag problem (#100632 )	2023-05-10 09:53:19 +00:00
storage.py	Fix loading data on different encoding (#94503 )	2023-04-25 21:05:20 +00:00
torch_version.py
types.py

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.