pytorch/docs/source
Wanchao Liang d560ee732e Implement gather primitive for ProcessGroupNCCL (#66745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66745

This PR implement NCCL gather and add gather to ProcessGroupNCCL using nccl send/recv api.

NCCL doesn’t directly provide primitives for gather, so we need to be implemented on top of NCCL’s send/recv API.
1. In ProcessGroupNCCL.cpp, the outputTensors are first flattened, then inputTensors and outputFlattened are passed by the collective class to gather() function in nccl.cpp.
1. In nccl.cpp, gather is implemented using ncclSend/ncclRecv: all the ranks send inputTensor to the root rank, and the root rank uses a for loop to receive these inputTensors.
ghstack-source-id: 147754838

Test Plan:
test_gather_ops
test_gather_checks
test_gather_stress

Reviewed By: pritamdamania87

Differential Revision: D29616361

fbshipit-source-id: b500d9b8e67113194c5cc6575fb0e5d806dc7782
2022-01-27 11:35:01 -08:00
..
_static clarify the documentation of torch.meshgrid (#62977) 2021-08-18 04:01:22 -07:00
_templates DOC: Merge extraheader block from theme instead of override (#70187) 2022-01-05 06:42:38 -08:00
community Update contribution_guide.rst (#64142) 2021-08-30 19:26:59 -07:00
elastic (torchelastic) make --max_restarts explicit in the quickstart and runner docs (#65838) 2021-09-29 19:29:01 -07:00
notes Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452) 2022-01-18 19:42:11 -08:00
rpc Support Union in TorchScript (#64234) 2021-09-03 06:12:24 -07:00
scripts [docs] Add images to some activation functions (#65415) 2021-09-22 11:05:29 -07:00
__config__.rst
amp.rst rebase for autocast updates to include device_type and dtype flags (#61002) 2021-08-10 20:03:12 -07:00
autograd.rst Update extending doc to cover forward mode AD (#66962) 2021-10-27 14:18:38 -07:00
backends.rst [Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980) 2021-12-03 19:06:30 -08:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst Grammatical update of tech docs (#61547) 2021-07-14 14:01:59 -07:00
conf.py [quant][graphmode] Rename backend_config_dict folder to backend (#69882) 2021-12-16 21:13:04 -08:00
cpp_extension.rst
cpp_index.rst
cuda.rst Document torch.cuda.ExternalStream, torch.cuda.caching_allocator_alloc and torch.cuda.caching_allocator_delete (#70126) 2022-01-12 15:44:40 -08:00
cudnn_persistent_rnn.rst Remove orphan from cuDNN persistent note (#65160) 2021-09-21 11:09:47 -07:00
cudnn_rnn_determinism.rst
data.rst [DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862) 2021-12-14 11:18:26 -08:00
ddp_comm_hooks.rst [DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352) 2021-09-01 17:37:19 -07:00
deploy.rst [deploy] docs (#69251) 2021-12-01 21:55:18 -08:00
distributed.algorithms.join.rst Add tutorial link (#62785) 2021-08-05 17:28:02 -07:00
distributed.elastic.rst
distributed.optim.rst [distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068) 2021-11-09 15:01:54 -08:00
distributed.rst Implement gather primitive for ProcessGroupNCCL (#66745) 2022-01-27 11:35:01 -08:00
distributions.rst [Reinstate] Wishart distribution (#70377) 2021-12-30 11:41:46 -08:00
dlpack.rst
docutils.conf
fft.rst C++ API and docs for hfftn (#66127) 2021-10-07 12:48:36 -07:00
futures.rst
fx.rst Fix for retracing documentation which would break for n-ary operators (#71599) 2022-01-24 12:04:25 -08:00
hub.rst Add more details to the known limitations section of torchhub docs (#69970) 2021-12-16 02:43:48 -08:00
index.rst torch/monitor: add pybind (#69567) 2022-01-12 13:35:11 -08:00
jit_builtin_functions.rst
jit_language_reference_v2.rst Add Union type to TorchScript Language Ref (#69514) 2021-12-07 12:53:54 -08:00
jit_language_reference.rst fix typos in jit_language_reference.rst (#68706) 2021-11-22 19:09:06 -08:00
jit_python_reference.rst
jit_unsupported.rst
jit.rst Back out "D30740897 Add fusion enabled apis" (#64500) 2021-09-04 20:55:58 -07:00
linalg.rst [Array API] Add linalg.diagonal (#70599) 2022-01-26 00:05:37 -08:00
math-quantizer-equation.png
mobile_optimizer.rst
model_zoo.rst
monitor.rst torch/monitor: TensorboardEventHandler (#71658) 2022-01-27 00:32:33 -08:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nn.functional.rst Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
nn.init.rst
nn.rst Implements the orthogonal parametrization (#62089) 2021-08-30 13:12:07 -07:00
onnx.rst Revert "[ONNX] Minor doc update (#69501)" (#71615) 2022-01-21 11:20:01 -08:00
optim.rst To add SequentialLR to PyTorch Core Schedulers (#64037) 2021-09-09 09:36:32 -07:00
package.rst Minor changes in documentation (#68557) 2021-11-18 17:57:16 -08:00
pipeline.rst Minor changes in documentation (#68557) 2021-11-18 17:57:16 -08:00
profiler.rst Add low level torch.profiler.kineto_profile base class (#63302) 2021-12-14 14:47:43 -08:00
quantization-support.rst [quant][bc-breaking] Remove QConfigDynamic from quantization api (#69875) 2021-12-17 23:10:06 -08:00
quantization.rst [quant][docs] quantized model save/load instructions (#69789) 2021-12-13 20:23:59 -08:00
random.rst
rpc.rst [distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068) 2021-11-09 15:01:54 -08:00
sparse.rst Update sparse.rst to warn about _values() (#71088) 2022-01-10 12:43:46 -08:00
special.rst [special] special alias for softmax (#62251) 2021-10-01 03:55:32 -07:00
storage.rst
tensor_attributes.rst
tensor_view.rst Minor changes in documentation (#68557) 2021-11-18 17:57:16 -08:00
tensorboard.rst
tensors.rst ammend tensors.rst and torch.rst for doc generation (#69030) 2021-11-30 12:04:13 -08:00
testing.rst move torch.testing from prototype to beta (#69668) 2021-12-17 09:52:47 -08:00
torch.ao.ns._numeric_suite_fx.rst Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380) 2021-10-11 18:47:58 -07:00
torch.ao.ns._numeric_suite.rst Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380) 2021-10-11 18:47:58 -07:00
torch.overrides.rst
torch.rst Porting index_add to structured kernels, add an out variant (#65993) 2021-12-14 11:57:13 -08:00
type_info.rst [Docs] Mention torch.bfloat16 in torch.finfo (#68496) 2021-11-18 17:52:41 -08:00