pytorch/caffe2/python/modeling
Devesh Agrawal 16549ed92b Scaled training and fetching from the PS
Summary:
Today, the PS's weirdly store the entire embedding and not just their
subsection of it. This was simply an oversight on the part of the original
author and this diff fixes that.

The sparse params are sharded to the PS's and the PS's just store their section
of the embedding. The trainer requests the id's as is from the PS. But the PS
divides the id by the num_of_shards before looking it up in the emdedding table
blob.  This happens on the backward and the forward pass. However, during the
model download part, the PS multiples the embeddings with the num_of_shards
before returning them to the trainer. The upshot is that the trainer does not
know anything about how the embeddings are scaled on the PS. The PS adds extra
divide and multiply steps to achieve that.

2. During estimation time, we allocate just one PS for estimation. So in order
to make all of the embeddings fit on the single PS: We simply additionally
scale the hash table sizes (proportionally and equally for all the sparse
params) such that it fits. This scaling is handled analogously to (1).

Reviewed By: boryiingsu

Differential Revision: D5664093

fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a
2017-08-23 18:16:03 -07:00
..
initializers_test.py Skip fp16 initializer test for CPU-only builds 2017-06-19 12:21:25 -07:00
initializers.py Create ParameterSharing abstraction for Caffe2. 2017-06-05 11:49:54 -07:00
parameter_info.py Scaled training and fetching from the PS 2017-08-23 18:16:03 -07:00
parameter_sharing_test.py Create ParameterSharing abstraction for Caffe2. 2017-06-05 11:49:54 -07:00
parameter_sharing.py Create ParameterSharing abstraction for Caffe2. 2017-06-05 11:49:54 -07:00