pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Pritam Damania 54c05fa34e Add basic GPU support to distributed autograd. (#40312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40312 As part of https://github.com/pytorch/pytorch/issues/40255, we realized that GPU support for distributed autograd was broken as part of our multithreaded autograd change. To fix this in the short term for 1.6, this PR includes the following changes: 1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the autograd graph. 2) The long lived CPU thread has its own ready_queue and this queue is used for all GraphTasks created by DistEngine. 3) In thread_main(), the CPU thread cannot exit once the GraphTask is done processing because of the new CPU thread added in 1). 4) To resolve this, thread_main() now has a parameter `device_thread` instead of `reentrant_thread`. When device_thread is True, we expect this to be a long lived device thread that does not exit. 5) When device_thread is False, thread_main is expected to run a GraphTask and return once done. ghstack-source-id: 106391329 Test Plan: waitforbuildbot Differential Revision: D22146183 fbshipit-source-id: dd146b7a95f55db75f6767889b7255e9d62d5825		2020-06-23 07:49:00 -07:00
..
faulty_agent	[1.5 Release][RPC Reliability] RRef Idempotency and RPC Retry enablement (#33636 )	2020-03-20 20:07:47 -07:00
jit	Support rpc_async call with timeout in JIT (#37884 )	2020-05-14 12:44:26 -07:00
tensorpipe	Add basic GPU support to distributed autograd. (#40312 )	2020-06-23 07:49:00 -07:00
test_dist_autograd_spawn.py	Add missing test launchers for JitRpcTest and JitDistAutogradTest (#32891 )	2020-02-24 21:42:47 -08:00
test_dist_optimizer_spawn.py	Move pytorch distributed tests to separate folder for contbuild. (#30445 )	2020-01-22 21:16:59 -08:00
test_rpc_spawn.py	Add missing test launchers for JitRpcTest and JitDistAutogradTest (#32891 )	2020-02-24 21:42:47 -08:00