Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches
Reviewed By: salexspb
Differential Revision: D4216181
fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.
Reviewed By: dzhulgakov
Differential Revision: D4235498
fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX.
Reviewed By: xianjiec
Differential Revision: D4181251
fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff