Commit Graph

3 Commits

Author SHA1 Message Date
Andrey Malevich
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
Xianjie Chen
4b3bd06a7f sparse nn converges better by dedupping sparse gradient by mean
Summary:
this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter.

experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first.

Differential Revision: D4367283

fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607
2016-12-27 22:59:29 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00