Commit Graph

9 Commits

Author SHA1 Message Date
Wanchao Liang
3876f94c3d [2/n] Thread PG: add test for broadcast (#89440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89440
Approved by: https://github.com/XilunWu, https://github.com/yhcharles, https://github.com/fduwjj
2022-11-21 22:36:42 +00:00
Wanchao Liang
deae450899 [1/n] Thread PG: add test for allgather (#89439)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89439
Approved by: https://github.com/XilunWu, https://github.com/yhcharles, https://github.com/fduwjj
2022-11-21 22:36:41 +00:00
Charlie Yan
ee05f47bdd Rebase and re-land thread PG (#88795)
The previous PR (https://github.com/pytorch/pytorch/pull/88627) has been reverted due to a failed check. After rebasing and rerun, all checks passed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88795
Approved by: https://github.com/huydhn, https://github.com/wanchaol
2022-11-15 21:58:58 +00:00
PyTorch MergeBot
c7fc710459 Revert "[3/n] Thread PG: add threaded PG implementation (#88627)"
This reverts commit 6dd081846e.

Reverted https://github.com/pytorch/pytorch/pull/88627 on behalf of https://github.com/huydhn due to This breaks one macos m1 test 6dd081846e in trunk. PR also fails with the same issue so I think trymerge code has a bug here letting this one merged
2022-11-09 22:38:41 +00:00
Charlie Yan
6dd081846e [3/n] Thread PG: add threaded PG implementation (#88627)
Summary: After the previous 2 diffs, finally we can add the threaded ProcessGroup implementation.

Test Plan: TBD

Reviewed By: XilunWu

Differential Revision: D40992593

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88627
Approved by: https://github.com/XilunWu, https://github.com/H-Huang
2022-11-09 20:51:11 +00:00
PyTorch MergeBot
f451e824f3 Revert " C10D extension to enable per-thread PG (#86348)"
This reverts commit 97abc21f2b.

Reverted https://github.com/pytorch/pytorch/pull/86348 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks macos tests 97abc21f2b
2022-10-14 01:26:46 +00:00
Rodrigo Kumpera
97abc21f2b C10D extension to enable per-thread PG (#86348)
Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86348
Approved by: https://github.com/rohan-varma
2022-10-13 22:23:28 +00:00
PyTorch MergeBot
6fae62b35f Revert "C10D extension to enable per-thread PG (#84153)"
This reverts commit 5cbffbbac9.

Reverted https://github.com/pytorch/pytorch/pull/84153 on behalf of https://github.com/kumpera due to broke internal stuff
2022-09-29 13:51:05 +00:00
Rodrigo Kumpera
5cbffbbac9 C10D extension to enable per-thread PG (#84153)
Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84153
Approved by: https://github.com/rohan-varma
2022-09-27 21:42:31 +00:00