Commit Graph

160 Commits

Author SHA1 Message Date
Xue Feng
0e9b0cf779 add error msg in fc input_record
Summary: as titled

Reviewed By: xianjiec

Differential Revision: D6787879

fbshipit-source-id: 4bbdd11455480b25fa18121fa4527a9f0a03addc
2018-01-23 14:48:15 -08:00
Xianjie Chen
76a141f016 add error msg in get_key
Summary: as title

Differential Revision: D6782896

fbshipit-source-id: bd29f6d085e56f51deb4bf6ad81771787fd85a5a
2018-01-23 11:04:05 -08:00
Dániel Simig
2dd79eb53a Visualize distribution of activation functions
Summary:
This is a  first attempt at completing bootcamp task T24449916. This diff contains 3 major changes:
1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics
2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object
3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook

Reviewed By: kennyhorror

Differential Revision: D6671273

fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b
2018-01-23 10:36:40 -08:00
Lin Yang
8e0177255e Test for PositionWeighted
Summary: add Test for SparseLookup with PositionWeighted.

Reviewed By: kennyhorror

Differential Revision: D6771612

fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
2018-01-22 19:20:46 -08:00
Yan Shang
41bb662d96 add dense regularization
Reviewed By: xianjiec

Differential Revision: D5617571

fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8
2018-01-08 13:03:17 -08:00
Tiangao Gou
bc50510016 use numerically stable version of BatchLRLoss
Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.

Reviewed By: xianjiec

Differential Revision: D6643074

fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
2018-01-02 13:18:36 -08:00
Xianjie Chen
7a5200b450 print exception in layers
Summary: as desc

Reviewed By: chocjy

Differential Revision: D6577301

fbshipit-source-id: 3c2d08a05f6fd1d6771019347e6dec4dd711a653
2017-12-15 12:12:28 -08:00
Jiyan Yang
d38a9bb4ec Fix dot processor with only one sparse feature and no dense feature
Summary:
As titled.

This will fail with the message: File "/mnt/xarfuse/uid-30088/f8742a88-seed-a26ddfbc-49aa-4c5f-9e08-91909f4775da-ns-4026532692/caffe2/python/layers/concat.py", line 52, in __init__
    "Concat expects that limited dimensions of the input tensor"

This is because the output scalar of the pairwise_dot_product layer won't contain shape information if output_dim is 1.
https://fburl.com/1m9r3ayp

This diff is fix it.

Reviewed By: xianjiec

Differential Revision: D6565930

fbshipit-source-id: 181181232065ef3fdfc825aa25d2714affbe6b8d
2017-12-14 13:05:17 -08:00
Qichao Que
234591a809 Support regression with output transform in MTML for feed
Summary: changes on metrics and mtml.

Differential Revision: D6457175

fbshipit-source-id: 1a162c519191f290e8e919cc7fe978f502ec2840
2017-12-11 17:20:20 -08:00
Yan Shang
cf07820849 Enable SparseLengthsMean
Differential Revision: D6445834

fbshipit-source-id: 5cbc95e6975b2447dc82dbe293d0ddd9adf6b5a3
2017-11-30 16:04:38 -08:00
Xue Feng
0c588a500b Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits for better numerical stability
Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.

Reviewed By: xianjiec

Differential Revision: D6305455

fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770
2017-11-30 14:04:36 -08:00
Ellie Wen
fc3f88d8a4 higher order interaction of embeddings
Summary:
Get higher order interaction of embeddings, similar to cross net but applied in the embedding level.
Formula:
  e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b
where l means the l-th layer of this higher order net, i means the i-th embedding in the list.

Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor.

Differential Revision: D6244001

fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8
2017-11-30 08:51:09 -08:00
Bingjun Sun
7e9724142a batched layer parameter loading for model initialization from an existing model
Summary:
Problem:
when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status.
Restriction:
1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter.
2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them.
Solution:
Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model).
Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough.

Reviewed By: xianjiec

Differential Revision: D6397607

fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da
2017-11-29 22:17:51 -08:00
Andrey Malevich
b766335753 Revert D6403523: [Part 2] Support regression with output transform in MTML for feed.
Summary:
This reverts commit faa0aab1227a27286b617e8e25adfbab3a349d2c

bypass-lint

Differential Revision: D6403523

fbshipit-source-id: eb43f348b09f2abcc52e101f43b0b9cc42a48ffb
2017-11-29 21:47:01 -08:00
Qichao Que
c9e181f50f Support regression with output transform in MTML for feed.
Summary: Support regression with output transform in MTML for feed.

Differential Revision: D6403523

fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c
2017-11-29 15:47:19 -08:00
Xianjie Chen
5250d7fd11 simplify logic for weighted pooling using id score list
Summary:
so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features.

- it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc.

Reviewed By: chocjy

Differential Revision: D6369270

fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4
2017-11-22 17:32:04 -08:00
Yan Shang
dcaaf51100 Support /sqrt(n) pooling
Differential Revision: D6378584

fbshipit-source-id: 3c6606c4e71afbd31dbb97ceeac38dfbe7b40090
2017-11-21 09:04:02 -08:00
Xue Feng
f0306c12ff add Mean Pooling distributed support
Reviewed By: dragonxlwang

Differential Revision: D6114111

fbshipit-source-id: bc0a79a4455e490bdfaa1d5d6d77badfacd2375c
2017-11-14 17:30:31 -08:00
Xianjie Chen
ae5673741b add option to do simple modulo
Summary: as desc.

Differential Revision: D6240061

fbshipit-source-id: 814a541a3e7f09ebbe2df63fd9202312e9f4c8d4
2017-11-10 13:49:07 -08:00
Anshul Verma
4761b32f96 make use of the average length of sparse features for init
Summary:
Ability to use average length of sparse feature to initialize weights. Based on experiments, it turns out that this allows a model to converge faster.

More results of the experiment -- https://fb.quip.com/VfraAXNFWhSg

Reviewed By: xianjiec

Differential Revision: D6092437

fbshipit-source-id: d979be7d755719ff297b999f73cba0671e267853
2017-11-08 07:31:47 -08:00
Ellie Wen
84b76a0712 fix shape info in concat layer
Summary:
The output shape info is incorrect, e.g. if we have 4 embeddings with dim size 32, the actual shape is (4, 32),
but the previous implementation in concat layer will give us (128, 1). This bug doesn't affect the dot products
calculation because the actual shape of the blob is still (4, 32) in concat_split_op

Differential Revision: D6264793

fbshipit-source-id: 82995e83a8c859cbd15617ff7850a35b30b453b6
2017-11-07 21:08:39 -08:00
Xianjie Chen
1b5c843a9c cleaner logic on sparse feature hashing
Reviewed By: kennyhorror

Differential Revision: D6195525

fbshipit-source-id: f687ac3d4914c3dbb0d35679e3a3d3a64a71ac53
2017-11-03 07:27:45 -07:00
Jiyan Yang
ee3baa2ed4 Add shape checks and print more info in parameter sharing
Summary: As titled.

Reviewed By: kittipatv

Differential Revision: D6145747

fbshipit-source-id: 39a212bb6bebbbf3164cade2f95db22ddb2d2c87
2017-10-27 01:22:06 -07:00
Kittipat Virochsiri
879e39ea5c Distill loss with SigmoidCrossEntropyWithLogits
Summary: Sigmoid + CrossEntropy has numerical stability issue. The gradient of sigmoid is `dx = dy * y * (1-y)`. When `label=0` and `x` is large, `1-y` could be round to (near) 0 and we loss `dx`. Switch to `SigmoidCrossEntropyWithLogits` solve the issue because the gradient is not dependent of `y`.

Reviewed By: chocjy

Differential Revision: D6086950

fbshipit-source-id: f990ae726802aa5c56fa62cf5e23f2e61ee047fa
2017-10-26 15:18:34 -07:00
Yan Shang
39359afc84 Add rank loss for retrieval models with random negative sample
Summary:
In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework.

The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss.

(Note: this ignores all push blocking failures!)

Reviewed By: kittipatv

Differential Revision: D5854486

fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a
2017-10-25 16:19:41 -07:00
Huazhong Ning
f7ad13694c support model init
Summary:
a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters.
This diff fixes this issue and also support model init for ads-intent project

Reviewed By: dragonxlwang

Differential Revision: D5991291

fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7
2017-10-19 15:56:37 -07:00
Bangsheng Tang
7b30436201 remove Alias in SparseFeatureHash
Summary: remove Alias in SparseFeatureHash

Reviewed By: kennyhorror

Differential Revision: D6094663

fbshipit-source-id: f313aeb17bf6cfdacae62b2c1ad6b4175d0882dd
2017-10-19 13:24:20 -07:00
Dmytro Dzhulgakov
2972a6ca02 Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger"
Summary:
This reverts commit 95c634872ac02be721257169e38c8fead04cd66b

bypass-lint

Differential Revision: D6026557

fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776
2017-10-12 20:21:52 -07:00
Artem Volkhin
5b10ad255b Use EMBEDDING feature type instead of FLOAT_TENSOR
Summary: create a special type for embeddings

Differential Revision: D5997808

fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454
2017-10-11 13:50:03 -07:00
Luke Yeager
75bece6ede Fix "No handlers could be found for logger"
Summary: Closes https://github.com/caffe2/caffe2/pull/1316

Differential Revision: D6026557

Pulled By: Yangqing

fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b
2017-10-10 22:32:13 -07:00
Xianjie Chen
9455eda57b cast distill loss teacher label to float
Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float.

Reviewed By: kittipatv

Differential Revision: D6018163

fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4
2017-10-10 01:16:07 -07:00
Kittipat Virochsiri
d5f60b240d Fix distill loss
Summary: The layer should also apply to evaluation as it's needed for feature importance run.

Reviewed By: xianjiec

Differential Revision: D6016125

fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8
2017-10-09 18:17:31 -07:00
Artem Volkhin
fb8a7679cc preprocs for embeddings
Summary: embeddings

Differential Revision: D5888420

fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421
2017-10-04 22:18:21 -07:00
Hassan Eslami
8e309c014c Tagging sparse parameters
Summary:
This is the first step on DPER side to use net transformation step (`parallelize_net`).

So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built.

Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`.

Reviewed By: azzolini

Differential Revision: D5902733

fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a
2017-10-04 18:46:48 -07:00
Hassan Eslami
7fc7756487 Refactor param initialization from model manipulation to layers logic
Summary: This diff refactors the parameter initialization logic from model manipulation to layers

Reviewed By: azzolini

Differential Revision: D5920225

fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46
2017-10-02 22:08:40 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Xiaolong Wang
642dea487d update inline comment
Summary: as desc

Reviewed By: kennyhorror

Differential Revision: D5930526

fbshipit-source-id: 510388fd66b487410ff748a9e6f546a8ce27bc1d
2017-09-28 10:17:13 -07:00
Huazhong Ning
808c9e3e70 fix a small typo error in sparse_lookup
Summary: as title

Reviewed By: kittipatv

Differential Revision: D5908455

fbshipit-source-id: e7c66e84a27273156d66dfd043e9cfd9b0ab9a98
2017-09-25 21:46:56 -07:00
Kittipat Virochsiri
5aac6a2e06 Make LastNWindowCollector thread-safe
Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data.

Reviewed By: chocjy

Differential Revision: D5858335

fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e
2017-09-22 09:48:30 -07:00
Anshul Verma
a340d141de Check num_elements > num_samples in UniformSampling
Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time.

Reviewed By: kittipatv

Differential Revision: D5858085

fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891
2017-09-21 16:37:20 -07:00
Kittipat Virochsiri
1b059f4c98 Add option to ignore parameter initialization
Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization.

Reviewed By: chocjy

Differential Revision: D5870362

fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a
2017-09-20 12:03:22 -07:00
Yan Shang
6a883d1bc0 Remove dot_product layer
Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer.

Reviewed By: kittipatv

Differential Revision: D5783303

fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b
2017-09-07 18:48:30 -07:00
Xianjie Chen
ec713d437d make sure the output of sparse lookup layer is float
Summary: currently, if reduer=Nonoe, the output if fp16

Differential Revision: D5773560

fbshipit-source-id: 24d7e5fae366d70352582e9a1ee14c7613753b7a
2017-09-07 17:47:39 -07:00
Dmitrii Podoprikhin
c7684e3b27 Rowwise quantization
Reviewed By: kennyhorror

Differential Revision: D5753626

fbshipit-source-id: 680c627a81658bcd653feab68e7040db0cb7a185
2017-09-06 10:19:38 -07:00
Long Jin
3faeb621d3 support id_score_list for Feed
Reviewed By: xianjiec

Differential Revision: D5624894

fbshipit-source-id: 1b2caba9ffcce68f346020485cb1f4edb01ca5e7
2017-08-24 00:32:05 -07:00
Jeonghee Yi
98da4e3a04 pairwise dot product with dot_groups support
Summary: extending pairwise dot-product only between dot_groups

Differential Revision: D5527060

fbshipit-source-id: be5d3178c332e122853a2f9d8da12a880608b0ab
2017-08-23 15:23:36 -07:00
Jeonghee Yi
d675c101e9 extend pairwise dot product for non-equal x & y dimension size
Summary: extend pairwise dot product for different number of embeddings on x & y dimensions

Differential Revision: D5663553

fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6
2017-08-23 02:08:20 -07:00
Badri Narayan Bhaskar
9507cae9e0 Create MergeIdListsLayer
Summary: We create a layer for MergeIdListsOp

Differential Revision: D5531348

fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25
2017-08-22 17:00:55 -07:00
Kittipat Virochsiri
0e5fcc7ca2 Make Tags a decorator as well
Summary: In case the whole function should be wrapped in certain context, this make it less ugly.

Reviewed By: xianjiec

Differential Revision: D5665253

fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a
2017-08-22 11:01:14 -07:00
Yan Shang
57c93435e3 Dedup name in functional layer
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.

Reviewed By: kittipatv

Differential Revision: D5641354

fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2
2017-08-17 17:50:34 -07:00