Commit Graph

55 Commits

Author SHA1 Message Date
Xiaolong Wang
4bb73b8361 [GanH] Weighting Layers: Adaptive/Constant/Homotopy
use case: to weight multiple losses (real values) as a single composite loss for
optimization
2018-03-20 13:34:22 -07:00
sf-wind
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
Orion Reblitz-Richardson
c55a642d83 [c2] update SparseFeatureHash layer
The diff makes following changes for this layer: copy length blob; add nameScope for output schema; add layer tests
2018-02-26 10:26:25 -08:00
Lin Yang
cec7003190 only enable FloatToHalf test for GPU
Reviewed By: bddppq

Differential Revision: D6945312

fbshipit-source-id: 9550a9607c0daec6783ce63d3c9f082ff27b0303
2018-02-08 17:48:47 -08:00
Lin Yang
27b9b7b15a Make TypeInference work for HalfToFloat & FloatToHalf.
Summary: add missing type mapping.

Reviewed By: kennyhorror

Differential Revision: D6940574

fbshipit-source-id: b70cea4ce2e519cb3e72d0482a38f50dbb968b4a
2018-02-08 15:33:43 -08:00
Lin Yang
95626737d0 enforce global_constant name should be a string
Reviewed By: kennyhorror

Differential Revision: D6880114

fbshipit-source-id: 2c9bd27b01cedb469f19843163b04a613fda5904
2018-02-04 01:02:27 -08:00
Lin Yang
252211b001 testPairwiseDotProduct
Summary: as title.

Reviewed By: kennyhorror

Differential Revision: D6793829

fbshipit-source-id: f803e0400635ca37184f1dd5bb711bfe0e4bea21
2018-01-26 11:33:08 -08:00
Lin Yang
8e0177255e Test for PositionWeighted
Summary: add Test for SparseLookup with PositionWeighted.

Reviewed By: kennyhorror

Differential Revision: D6771612

fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
2018-01-22 19:20:46 -08:00
Lin Yang
4ea6e6a556 testSparseLookup
Summary: add basic test for SparseLookup

Reviewed By: kennyhorror

Differential Revision: D6749915

fbshipit-source-id: f97af785e4f89f36788a992843066fd1ec2b75a9
2018-01-19 09:27:20 -08:00
Tiangao Gou
bc50510016 use numerically stable version of BatchLRLoss
Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.

Reviewed By: xianjiec

Differential Revision: D6643074

fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
2018-01-02 13:18:36 -08:00
Yan Shang
39359afc84 Add rank loss for retrieval models with random negative sample
Summary:
In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework.

The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss.

(Note: this ignores all push blocking failures!)

Reviewed By: kittipatv

Differential Revision: D5854486

fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a
2017-10-25 16:19:41 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Yangqing Jia
3cd0003bf6 fix layers_test: atol should almost always accompany rtol
Summary: TSIA

Reviewed By: chocjy

Differential Revision: D5898129

fbshipit-source-id: f49e8478f79d9df5b59a26287fff7fc5417aac6e
2017-09-22 23:31:01 -07:00
Kittipat Virochsiri
5aac6a2e06 Make LastNWindowCollector thread-safe
Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data.

Reviewed By: chocjy

Differential Revision: D5858335

fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e
2017-09-22 09:48:30 -07:00
Yangqing Jia
9b2c5501b8 Fix Windows build
Summary:
After this, windows should be all green.
Closes https://github.com/caffe2/caffe2/pull/1228

Reviewed By: bwasti

Differential Revision: D5888328

Pulled By: Yangqing

fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34
2017-09-21 20:13:15 -07:00
Anshul Verma
a340d141de Check num_elements > num_samples in UniformSampling
Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time.

Reviewed By: kittipatv

Differential Revision: D5858085

fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891
2017-09-21 16:37:20 -07:00
Badri Narayan Bhaskar
9507cae9e0 Create MergeIdListsLayer
Summary: We create a layer for MergeIdListsOp

Differential Revision: D5531348

fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25
2017-08-22 17:00:55 -07:00
Yan Shang
57c93435e3 Dedup name in functional layer
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.

Reviewed By: kittipatv

Differential Revision: D5641354

fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2
2017-08-17 17:50:34 -07:00
Long Jin
ef64a4f6b2 Add conv layer and layer tests
Reviewed By: xianjiec

Differential Revision: D5569206

fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d
2017-08-08 10:57:43 -07:00
Jacqueline Xu
a1bf14d8e6 Building new randomized sparse nn model
Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable

Reviewed By: chocjy

Differential Revision: D5416489

fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d
2017-08-07 12:48:58 -07:00
Jiyan Yang
4b80ff89e2 Use softsign op for s=0 in arc-cosine feature map
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.

Reviewed By: jackielxu

Differential Revision: D5551742

fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
2017-08-03 23:35:11 -07:00
Jacqueline Xu
13569c9aa0 Fixing semi-random layer model for multi-layer models
Summary:
Updated the semi-random layer model for multi-layer models using semi-random layers.

Notable changes:
- Input and outputs for the semi-random layer is now a Struct with "full" and "random" components
- Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer)

Reviewed By: chocjy

Differential Revision: D5496034

fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04
2017-07-27 15:25:19 -07:00
Jacqueline Xu
9bec54bbf1 Modify arc cosine feature map and semi random layers to initialize parameters as global constants
Summary:
The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer.

I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished.

In this diff, I've:
- Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers
- Updated unit tests
- Ran an end-to-end test, enabling multiple readers to test the fixed issue

Reviewed By: chocjy

Differential Revision: D5483372

fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd
2017-07-26 16:37:00 -07:00
Honghao Wei
290acab2c7 implement drelu and unittest
Summary:
In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details.
To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px.

To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI.

I also add units test for DRelu. We check the shape of output and also do the numeric unit tests.
For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer.

Reviewed By: chocjy

Differential Revision: D5341464

fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630
2017-07-20 11:50:08 -07:00
Honghao Wei
b68adec7bb adding model loss logic
Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'.

Reviewed By: xianjiec

Differential Revision: D5399056

fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b
2017-07-14 16:25:23 -07:00
Jacqueline Xu
2aa8fc7e8d Implementing Semi-Random Features Layer
Summary:
- (Split diff from Arc Cosine)
- Implemented [[ https://arxiv.org/pdf/1702.08882.pdf | Semi-Random Features ]] Layer
- Created a buck unit test for SRF Layer

Reviewed By: chocjy

Differential Revision: D5374803

fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae
2017-07-14 13:15:50 -07:00
Jiyan Yang
043640c3eb Return top K classes
Reviewed By: kittipatv

Differential Revision: D5363481

fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af
2017-07-13 00:20:00 -07:00
Tao Wu
02aa5ad9fb make functional layer return scalar if only one output
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.

Reviewed By: kittipatv

Differential Revision: D5386853

fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
2017-07-12 11:34:31 -07:00
Jacqueline Xu
e89e71c595 Simplifying Random Fourier Features and layer test
Summary:
- Condensed operators in RFF layer
- Adjusted RFF layer test; made test code more concise

Reviewed By: chocjy

Differential Revision: D5391436

fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1
2017-07-11 00:40:53 -07:00
Jacqueline Xu
6ea71155c1 Implementing Arc Cosine Layer
Summary:
- Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf | Arc Cosine ]] layer
  - Developed buck unit test for Arc Cosine

Reviewed By: chocjy

Differential Revision: D5367604

fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0
2017-07-10 10:10:36 -07:00
Jacqueline Xu
25bd5dda27 Implementing random fourier features layer
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf |   Random Features for Large-Scale Kernel Machines]]

Reviewed By: chocjy

Differential Revision: D5318105

fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
2017-07-04 23:48:42 -07:00
Tao Wu
4be5337cca add support for weight in batch_softmax_loss
Summary: weighted batch_softmax_loss when weight exists in input_record

Reviewed By: kittipatv

Differential Revision: D5291646

fbshipit-source-id: f1bcd386ad1fc0e95e0a0315ec1c36531c792495
2017-06-21 10:32:15 -07:00
Jacqueline Xu
6150d9bef2 Building dropout as layer
Summary: Dropout layer and unittest for DPer2

Reviewed By: chocjy

Differential Revision: D5254866

fbshipit-source-id: 5eaea81808ddf8e0c7a7d76209ea44cda2ee28aa
2017-06-19 14:46:52 -07:00
Thomas Dudziak
60c78d6160 Fixes range/xrange for Python 3
Summary: As title

Differential Revision: D5151894

fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638
2017-06-07 00:04:26 -07:00
Jiyan Yang
6aff754dbc Add batch normalization layer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5077230

fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0
2017-05-26 16:46:52 -07:00
Kittipat Virochsiri
211eae127c LastNWindowCollector
Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator.

Reviewed By: chocjy

Differential Revision: D4981772

fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883
2017-05-04 17:32:09 -07:00
Kittipat Virochsiri
22d4eaeb9e JoinContext
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.

Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.

Reviewed By: kennyhorror

Differential Revision: D4964949

fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
2017-05-02 17:32:26 -07:00
Chonglin Sun
e8e93066e7 add workflow for user complicated embedding
Summary: Correctly propagate request_only tag to all layer.

Reviewed By: kennyhorror

Differential Revision: D4751496

fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316
2017-05-02 10:46:52 -07:00
Jiyan Yang
795dc1c326 Remove loss ops from eval net
Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net.

Differential Revision: D4934589

fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7
2017-04-26 12:46:25 -07:00
Jiyan Yang
ef2701a57e MapToRange layer
Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model.

Reviewed By: kittipatv

Differential Revision: D4902556

fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee
2017-04-25 16:03:58 -07:00
Kittipat Virochsiri
fd9185ab21 fix getting empty struct
Summary: `not field` calls `__len__()`, causing the field to appear to be missing even when it's not

Differential Revision: D4910587

fbshipit-source-id: bc2b2fadab96571ae43c4af97b30e50c084437af
2017-04-19 22:36:05 -07:00
Huazhong Ning
ad6b53e401 allow to specify output dtypes for functional layers
Summary:
Currently, the functional layer infers the output types and shapes by running the operator once.
But in cases where special input data are needed to run the operator, the inferrence may fail.
This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail.

Reviewed By: kennyhorror

Differential Revision: D4864003

fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc
2017-04-18 16:34:52 -07:00
Kittipat Virochsiri
0a726af42e Coerce input of FunctionalLayer to record
Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway.

Differential Revision: D4895686

fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76
2017-04-17 19:26:06 -07:00
Kittipat Virochsiri
baf33161d4 GatherRecord layer
Summary: Perform gather on the whole record. This will be used for negative random sampling.

Reviewed By: kennyhorror

Differential Revision: D4882430

fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289
2017-04-13 15:02:44 -07:00
Kittipat Virochsiri
5c32c82a6d Add option to subtract log odd from sampled trained prediction.
Summary: Useful for sampled softmax training

Differential Revision: D4782673

fbshipit-source-id: 88195de60070a0bc16f5e06b9aad4dffd0484546
2017-04-03 17:50:58 -07:00
Xianjie Chen
9fc56793dd fix trunk for push and small cleanup
Summary:
multiple places broken, blocking the push :(

- fix the weighted training for ads and feeds
- fix the publishing if no exporter model is selected
- fix the feeds retrieval evaluation
- added the default config for retrieval workflows. plan to use for flow test (in next diff)
- clean up not used code
- smaller hash size for faster canary test

Reviewed By: chocjy

Differential Revision: D4817829

fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807
2017-04-02 23:35:49 -07:00
Kittipat Virochsiri
3eb3507367 uniform_sampling layer
Summary: This layer will be used to sample negative labels for sampled softmax.

Differential Revision: D4773444

fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619
2017-03-29 14:36:12 -07:00
Andrey Malevich
7cc92b1260 Add eval net for layer_model_helper
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.

Differential Revision: D4769453

fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
2017-03-29 04:03:40 -07:00
Kittipat Virochsiri
da36212259 SamplingTrain layer
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema.  When initiated in prediction context, the full output of the wrap layer is produced.

This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.

I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.

Reviewed By: xianjiec

Differential Revision: D4689887

fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
2017-03-27 23:31:55 -07:00
Huazhong Ning
8168e8ac25 allows to specify output names for functional layers
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.

Reviewed By: kennyhorror

Differential Revision: D4744949

fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
2017-03-23 13:18:58 -07:00