pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Devesh Agrawal 16549ed92b Scaled training and fetching from the PS Summary: Today, the PS's weirdly store the entire embedding and not just their subsection of it. This was simply an oversight on the part of the original author and this diff fixes that. The sparse params are sharded to the PS's and the PS's just store their section of the embedding. The trainer requests the id's as is from the PS. But the PS divides the id by the num_of_shards before looking it up in the emdedding table blob. This happens on the backward and the forward pass. However, during the model download part, the PS multiples the embeddings with the num_of_shards before returning them to the trainer. The upshot is that the trainer does not know anything about how the embeddings are scaled on the PS. The PS adds extra divide and multiply steps to achieve that. 2. During estimation time, we allocate just one PS for estimation. So in order to make all of the embeddings fit on the single PS: We simply additionally scale the hash table sizes (proportionally and equally for all the sparse params) such that it fits. This scaling is handled analogously to (1). Reviewed By: boryiingsu Differential Revision: D5664093 fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a		2017-08-23 18:16:03 -07:00
..
initializers_test.py	Skip fp16 initializer test for CPU-only builds	2017-06-19 12:21:25 -07:00
initializers.py	Create ParameterSharing abstraction for Caffe2.	2017-06-05 11:49:54 -07:00
parameter_info.py	Scaled training and fetching from the PS	2017-08-23 18:16:03 -07:00
parameter_sharing_test.py	Create ParameterSharing abstraction for Caffe2.	2017-06-05 11:49:54 -07:00
parameter_sharing.py	Create ParameterSharing abstraction for Caffe2.	2017-06-05 11:49:54 -07:00