Commit Graph

2 Commits

Author SHA1 Message Date
Pritam Damania
5344c3ea9e Remove join_workers from Pipeline destructor. (#53433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53433

As described in https://github.com/pytorch/pytorch/issues/53413, the
pipeline destructor ends up hanging sometimes. The reason for this is that Pipe
uses daemon threads and as a result these threads could be destroyed before the
Pipe destructor is done. The Pipe destructor then calls `join_workers` which
waits on signals from the worker threads, which might be already dead and
results in the main thread blocking forever.

To resolve this issue, in this PR we remove `join_workers` completely since it
is not necessary to wait for daemon threads.

#Closes: https://github.com/pytorch/pytorch/issues/53413
ghstack-source-id: 123641509

Test Plan:
1) Tested with repro in
https://github.com/pytorch/pytorch/issues/53413.
2) Hard to add a unit test for this since the bug really depends on order of
objects being destroyed.

Reviewed By: rohan-varma

Differential Revision: D26863321

fbshipit-source-id: 18fff072cabacfb10390e971eac789859d3dcc81
2021-03-11 17:05:22 -08:00
Pritam Damania
9d91360b5d Cleanup APIs for pipeline parallelism. (#48630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48630

1) Make torch.distributed.pipeline package public.
2) Make several helper methods private.
ghstack-source-id: 118820803

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D25235688

fbshipit-source-id: c32833ebf090ddbd4eaf06fcb5e3f9d421623a60
2020-12-18 15:17:13 -08:00