Summary:
This adds Caffe2 support for MKL operators directly with MKLMemory. Included a
Relu layer that shows how to use it.
Reviewed By: salexspb
Differential Revision: D4322144
fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799
Summary:
float64 test breaks things on the cuda side. I am deleting it for now and if
we add it back, let's make sure we run the test on a GPU machine first :)
Reviewed By: azzolini
Differential Revision: D4324427
fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74
Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches
Reviewed By: salexspb
Differential Revision: D4216181
fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.
Reviewed By: dzhulgakov
Differential Revision: D4235498
fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX.
Reviewed By: xianjiec
Differential Revision: D4181251
fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff