Commit Graph

56 Commits

Author SHA1 Message Date
Jane Xu
0a48f56318 Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"
Test Plan: revert-hammer

Differential Revision:
D31299350 (f1f3bd8c36)

Original commit changeset: 9ad5c8fa17f7

fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317
2021-10-08 17:55:28 -07:00
Rohan Varma
f1f3bd8c36 Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883

Original commit changeset: d8e962b8aab6
ghstack-source-id: 139836954

Test Plan: ci

Reviewed By: zhaojuanmao

Differential Revision: D31299350

fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130
2021-10-08 16:04:20 -07:00
Mike Ruberry
91f8755b0e Revert D31005792: [NCCL] Init dummy NCCL comms in constructor
Test Plan: revert-hammer

Differential Revision:
D31005792 (2b22a5dde2)

Original commit changeset: c2c582dee25a

fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27
2021-09-29 20:46:38 -07:00
Rohan Varma
2b22a5dde2 [NCCL] Init dummy NCCL comms in constructor (#65173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173

Initializes dummy NCCL communicators in constructor for a basic health
check that communicators can be initialized prior to launching the first
collective.

After successful init, we immediately use `ncclCommAbort` to destroy these
communicators to ensure they don't interfere with regular communicator creation
during collectives.

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31005792

fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b
2021-09-29 15:36:54 -07:00
Kimish Patel
54f2eb6e7e [Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792

KinetoEvent

This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.

With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.

Test Plan:
test_profiler

Imported from OSS

Reviewed By: raziel, ilia-cher

Differential Revision: D29745442

fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
2021-08-13 21:39:10 -07:00
Luca Wehrstedt
a016150163 Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543

Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292

Test Plan: It builds

Reviewed By: cbalioglu

Differential Revision: D29062002

fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
2021-06-24 12:38:51 -07:00