pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Myle Ott	f5f6258288	Enable additional tensor types in Gloo backend (#5483 )	2018-03-15 14:53:24 +01:00
Ailing	92596197fc	add end to end test for DistributedDataParallel (#5182 ) * add end to end test for DistributedDataParallel * address comments * skip subgroup tests when less than 3 processes * set process number based on available gpus * add single gpu;cleanup WORLD_SIZE * fix comments	2018-03-08 22:07:34 -05:00
Ailing	ff3f689239	Add mote tests for Nccl backend (#4796 )	2018-01-28 12:36:59 +01:00
Teng Li	926ed2b280	Implemented NCCL Distributed Backend for PyTorch with new dist APIs (#3435 ) * Implemented NCCL Distributed Backend for PyTorch with new dist APIs * Let FindNCCL to determine the NCCL version * Let NCCL2 Backend use ATEN instead deprecated THPP * Let distributed parallel model use a single reduction thread for NCCL backend * Caching the sockets, bug fix, refactoring, and addressed Adam's comments * Make BcastNcclID take a single param and bug fix for all_gather * Removed barrier function, added warning for users, and not exposing experimental func to users * Use the simplest single bucket working solution for distriubted data parallel model with rebase * Cleanup, fixes and further addressed Adam's comments * Used PySequence_Fast in distributed csrc * Removed the limitation that each group is only bound to a given device sequence * Used THPObjectPtr for PySequence_Fast	2017-11-29 15:57:02 -05:00
Adam Paszke	2a8603c5e1	Make distributed recv return sender rank	2017-09-25 12:11:52 -04:00
Soumith Chintala	674e1f2ba1	increase test subprocess timeout	2017-08-27 21:11:08 -04:00
gchanan	5b8e2ad2a6	test_distributed cuda tests don't skip if cuda not available. (#2476 ) test_distributed cuda tests don't skip if cuda not available.	2017-08-17 17:45:32 -04:00
gchanan	0985eaf373	Add ability to specify init_method for test_distributed. (#2465 ) * Add ability to specify init_method for test_distributed. * Move init_method specification to test run line. * Run for gloo tests as well. * Better status message for gloo test.	2017-08-16 17:04:21 -04:00
Adam Paszke	8915e2710c	Refactor scatter/gather and add distributed docs	2017-07-12 14:47:36 -04:00
lynic	ebdec9a837	Skip distributed tests if not supported (#2004 )	2017-07-07 11:06:56 -04:00
Adam Paszke	d9d50f80c7	Rename arguments to distributed collectives	2017-06-12 22:02:11 -04:00
Adam Paszke	714351ff39	Officially enable process-group mode	2017-06-12 22:02:11 -04:00
Adam Paszke	4ebf3ff46d	Add base for CUDA allReduce and broadcast in DataChannelGloo	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	f07f13c6e9	Change Store exception handling	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	310d08c37b	Fix store and all operations	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	2b340e7d50	Add python tests; Remove broken prefix store creation	2017-05-01 01:49:09 -07:00
Adam Paszke	a3e11d606b	Fix linter errors	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	962084c8e8	Add Data Channel receive from any source (#52 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	76520512e7	DataChannel tests rewrite (#42 ); DataChannel `isend` and `irecv` implementation (#44 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	ac1f68127a	Add barrier, scatter, gather and allGather implementations + groups (#34 )	2017-01-31 01:58:09 +01:00
Mateusz Piotrowski	3e3501c98d	Integration tests of the THD Python interface (#28 )	2017-01-31 01:58:09 +01:00

1 2

71 Commits