Summary:
This is a first attempt at completing bootcamp task T24449916. This diff contains 3 major changes:
1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics
2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object
3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook
Reviewed By: kennyhorror
Differential Revision: D6671273
fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b
Summary: add Test for SparseLookup with PositionWeighted.
Reviewed By: kennyhorror
Differential Revision: D6771612
fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.
Reviewed By: xianjiec
Differential Revision: D6643074
fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
Summary:
As titled.
This will fail with the message: File "/mnt/xarfuse/uid-30088/f8742a88-seed-a26ddfbc-49aa-4c5f-9e08-91909f4775da-ns-4026532692/caffe2/python/layers/concat.py", line 52, in __init__
"Concat expects that limited dimensions of the input tensor"
This is because the output scalar of the pairwise_dot_product layer won't contain shape information if output_dim is 1.
https://fburl.com/1m9r3ayp
This diff is fix it.
Reviewed By: xianjiec
Differential Revision: D6565930
fbshipit-source-id: 181181232065ef3fdfc825aa25d2714affbe6b8d
Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.
Reviewed By: xianjiec
Differential Revision: D6305455
fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770
Summary:
Get higher order interaction of embeddings, similar to cross net but applied in the embedding level.
Formula:
e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b
where l means the l-th layer of this higher order net, i means the i-th embedding in the list.
Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor.
Differential Revision: D6244001
fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8
Summary:
Problem:
when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status.
Restriction:
1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter.
2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them.
Solution:
Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model).
Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough.
Reviewed By: xianjiec
Differential Revision: D6397607
fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da
Summary: Support regression with output transform in MTML for feed.
Differential Revision: D6403523
fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c
Summary:
so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features.
- it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc.
Reviewed By: chocjy
Differential Revision: D6369270
fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4
Summary:
Ability to use average length of sparse feature to initialize weights. Based on experiments, it turns out that this allows a model to converge faster.
More results of the experiment -- https://fb.quip.com/VfraAXNFWhSg
Reviewed By: xianjiec
Differential Revision: D6092437
fbshipit-source-id: d979be7d755719ff297b999f73cba0671e267853
Summary:
The output shape info is incorrect, e.g. if we have 4 embeddings with dim size 32, the actual shape is (4, 32),
but the previous implementation in concat layer will give us (128, 1). This bug doesn't affect the dot products
calculation because the actual shape of the blob is still (4, 32) in concat_split_op
Differential Revision: D6264793
fbshipit-source-id: 82995e83a8c859cbd15617ff7850a35b30b453b6
Summary: Sigmoid + CrossEntropy has numerical stability issue. The gradient of sigmoid is `dx = dy * y * (1-y)`. When `label=0` and `x` is large, `1-y` could be round to (near) 0 and we loss `dx`. Switch to `SigmoidCrossEntropyWithLogits` solve the issue because the gradient is not dependent of `y`.
Reviewed By: chocjy
Differential Revision: D6086950
fbshipit-source-id: f990ae726802aa5c56fa62cf5e23f2e61ee047fa
Summary:
In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework.
The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss.
(Note: this ignores all push blocking failures!)
Reviewed By: kittipatv
Differential Revision: D5854486
fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a
Summary:
a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters.
This diff fixes this issue and also support model init for ads-intent project
Reviewed By: dragonxlwang
Differential Revision: D5991291
fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7
Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float.
Reviewed By: kittipatv
Differential Revision: D6018163
fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4
Summary: The layer should also apply to evaluation as it's needed for feature importance run.
Reviewed By: xianjiec
Differential Revision: D6016125
fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8
Summary:
This is the first step on DPER side to use net transformation step (`parallelize_net`).
So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built.
Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`.
Reviewed By: azzolini
Differential Revision: D5902733
fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a
Summary: This diff refactors the parameter initialization logic from model manipulation to layers
Reviewed By: azzolini
Differential Revision: D5920225
fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46
Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data.
Reviewed By: chocjy
Differential Revision: D5858335
fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e
Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time.
Reviewed By: kittipatv
Differential Revision: D5858085
fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891
Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization.
Reviewed By: chocjy
Differential Revision: D5870362
fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a
Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer.
Reviewed By: kittipatv
Differential Revision: D5783303
fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b
Summary: extend pairwise dot product for different number of embeddings on x & y dimensions
Differential Revision: D5663553
fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6
Summary: In case the whole function should be wrapped in certain context, this make it less ugly.
Reviewed By: xianjiec
Differential Revision: D5665253
fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.
Reviewed By: kittipatv
Differential Revision: D5641354
fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2