Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.
Reviewed By: xianjiec
Differential Revision: D6643074
fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
Summary:
In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework.
The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss.
(Note: this ignores all push blocking failures!)
Reviewed By: kittipatv
Differential Revision: D5854486
fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a
Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data.
Reviewed By: chocjy
Differential Revision: D5858335
fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e
Summary:
After this, windows should be all green.
Closes https://github.com/caffe2/caffe2/pull/1228
Reviewed By: bwasti
Differential Revision: D5888328
Pulled By: Yangqing
fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34
Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time.
Reviewed By: kittipatv
Differential Revision: D5858085
fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.
Reviewed By: kittipatv
Differential Revision: D5641354
fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2
Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable
Reviewed By: chocjy
Differential Revision: D5416489
fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.
Reviewed By: jackielxu
Differential Revision: D5551742
fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
Summary:
Updated the semi-random layer model for multi-layer models using semi-random layers.
Notable changes:
- Input and outputs for the semi-random layer is now a Struct with "full" and "random" components
- Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer)
Reviewed By: chocjy
Differential Revision: D5496034
fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04
Summary:
The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer.
I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished.
In this diff, I've:
- Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers
- Updated unit tests
- Ran an end-to-end test, enabling multiple readers to test the fixed issue
Reviewed By: chocjy
Differential Revision: D5483372
fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd
Summary:
In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details.
To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px.
To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI.
I also add units test for DRelu. We check the shape of output and also do the numeric unit tests.
For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer.
Reviewed By: chocjy
Differential Revision: D5341464
fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630
Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'.
Reviewed By: xianjiec
Differential Revision: D5399056
fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b
Summary:
- (Split diff from Arc Cosine)
- Implemented [[ https://arxiv.org/pdf/1702.08882.pdf | Semi-Random Features ]] Layer
- Created a buck unit test for SRF Layer
Reviewed By: chocjy
Differential Revision: D5374803
fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.
Reviewed By: kittipatv
Differential Revision: D5386853
fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf | Random Features for Large-Scale Kernel Machines]]
Reviewed By: chocjy
Differential Revision: D5318105
fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator.
Reviewed By: chocjy
Differential Revision: D4981772
fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.
Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.
Reviewed By: kennyhorror
Differential Revision: D4964949
fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net.
Differential Revision: D4934589
fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7
Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model.
Reviewed By: kittipatv
Differential Revision: D4902556
fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee
Summary: `not field` calls `__len__()`, causing the field to appear to be missing even when it's not
Differential Revision: D4910587
fbshipit-source-id: bc2b2fadab96571ae43c4af97b30e50c084437af
Summary:
Currently, the functional layer infers the output types and shapes by running the operator once.
But in cases where special input data are needed to run the operator, the inferrence may fail.
This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail.
Reviewed By: kennyhorror
Differential Revision: D4864003
fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc
Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway.
Differential Revision: D4895686
fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76
Summary: Perform gather on the whole record. This will be used for negative random sampling.
Reviewed By: kennyhorror
Differential Revision: D4882430
fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289
Summary:
multiple places broken, blocking the push :(
- fix the weighted training for ads and feeds
- fix the publishing if no exporter model is selected
- fix the feeds retrieval evaluation
- added the default config for retrieval workflows. plan to use for flow test (in next diff)
- clean up not used code
- smaller hash size for faster canary test
Reviewed By: chocjy
Differential Revision: D4817829
fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807
Summary: This layer will be used to sample negative labels for sampled softmax.
Differential Revision: D4773444
fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.
Differential Revision: D4769453
fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema. When initiated in prediction context, the full output of the wrap layer is produced.
This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.
I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.
Reviewed By: xianjiec
Differential Revision: D4689887
fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.
Reviewed By: kennyhorror
Differential Revision: D4744949
fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
Summary: For some embedding task, we don't want to include bias term in embedding computation.
Reviewed By: xianjiec
Differential Revision: D4689620
fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644
Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually.
Reviewed By: xianjiec, kennyhorror
Differential Revision: D4703982
fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6
Summary:
This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around.
The basic idea of the diff:
1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters).
2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed.
Reviewed By: xianjiec
Differential Revision: D4408771
fbshipit-source-id: aced7487571940d726424269970df0eb62670c39