Summary: Currently accelerators does not have the concept for fp32, it only has understandings of fp16 and int8 in terms of data input. In order to fixe the issue here, we want to make sure unaries are turned into fp16 when we have the int8 exporter turned on.
Reviewed By: kennyhorror
Differential Revision: D17743791
fbshipit-source-id: 7322d23eb12ac3f813b525fc0ddd066f95c8ca85
Summary:
Support attention weights input to SparseLookup. In attention sum pooling, if attention weights can be pre-calculated before embedding lookup, they can be passed to SparseLookup and processed by SparseLengthsWeightedSum op. One example is id_score attention sum pooling.
Essentially the net is converted from:
LengthsSum(Mul(Gather(keys, w), att_weight))
to:
SpaseLenghtsWeightedSum(keys, w, att_weight)
It unblocks potential efficiency gain with distributed training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26748
Test Plan: unit test
Reviewed By: chocjy
Differential Revision: D17553345
Pulled By: wheatkit
fbshipit-source-id: 60cc3c4b0bc1eade5459ac598e85286f3849a412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25426
Add embedding table 4bit quantization support.
* add the conversion from fp32 to int4.
* using brew to pass the context so that the 4bit operators are added when generating the predictor net.
Reviewed By: kennyhorror, chocjy
Differential Revision: D16859892
fbshipit-source-id: a06c3f0b56a7eabf9ca4a2b2cb6c63735030d70b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24863
Add the sparse feature name in logging for ease of debugging
Test Plan:
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/sparse_nn/pooling_test#binary.par -r test_simple_sum_pooling_named_exception
Another test for id_score_list. the original sparse_key is equivalent to get_key(self.input_record)()
P98343716
./buck-out/gen/caffe2/caffe2/python/layers_test-2.7#binary.par -r test_get_key
Reviewed By: chocjy
Differential Revision: D16901964
fbshipit-source-id: 2523de2e290aca20afd0b909111541d3d152a588
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348
This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient.
Reviewed By: itomatik
Differential Revision: D16044736
fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21389
As titled. To do weight re-init on evicted rows in embedding table, we need to pass the info of the evicted hashed values to SparseLookup, which is the layer model responsible for constructing the embedding table and do pooling.
To pass evicted values, we need to adjust the output record of lru_sparse_hash to include the evicted values, and add optional input to all processors that needs to take in sparse segment. For SparseLookup to get the evicted values, its input record needs to be adjusted. Now the input record can have type IdList/IdScoreList/or a struct of feature + evicted values
Reviewed By: itomatik
Differential Revision: D15590307
fbshipit-source-id: e493881909830d5ca5806a743a2a713198c100c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20915
Clean the unary processor code. Some question are added into the comments to seek suggestions.
Reviewed By: pjh5
Differential Revision: D15448502
fbshipit-source-id: ef0c45718c1a06187e3fe2e4e59b7f20c641d9c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18499
If the init op is not fp16 compatible, it should throw.
However, in the special case where the original init op is UniformFill,
we replace it with Float16UniformFill
Reviewed By: kennyhorror
Differential Revision: D14627209
fbshipit-source-id: eb427772874a732ca8b3a25d06670d119ce8ac14
* [GanH]: two_task_discriminator
as titled
and adding label smooth
* [Dper2] Simplified UI options needed for blob magnitude visualization
* [GanH]: fix tags
as titled
* Added type and shape inference for GatherRange operator
This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.
* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python
We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.
* Bind Gloo IoException to IoError in Python
Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.
* [GanH]: add label smoothing to softmax with loss
as titled
* [C2] Enable LARS in Adagrad and hook it to DPER
* [DPER] Don't pass LayerModelHelper in create_trainer_nodes
Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.
* fix bugs in LambdaRankNdcgOp
the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.
* Restrict thread pool on iOS to only big cores
Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.
* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
* make clang happy and get fewer warnings
make clang happy and get fewer warnings
* [Personalization] Support add_output_schema() in layer_model_helper
Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.
Solution:
For flexibility, we want to add fields to output_schema incrementally.
Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.
Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer
Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
After D6953547 some of the blobs were no longer impacted by uint8 quanitzation,
but they would still generate operators expecting uint8 inputs and thus fail.
This diff is adding a temporal hack to avoid doing this quantization when layer
is not quantized.
Will fix it with switching to Net rewriting instead.
Summary:
In some cases we were doing quantization even we we should not. This diff is
preventing this from happening.
Reviewed By: rayleichen
Differential Revision: D6953547
fbshipit-source-id: 7c65baaf969e5e1bddb68ca8182f4f3b43f2431d
Summary: Updates `sparse_lookup.py` for the new fused 8-bit rowwise quantization. Mostly just changing the same files as the original diffs (D5753626 and D5761202). I know very little about this code here so please let me know if this is safe, also in terms of migration away from the non-fused storage.
Reviewed By: kennyhorror
Differential Revision: D6710784
fbshipit-source-id: 185f147af52a094a937ba631b0351225e660d205
Summary: add Test for SparseLookup with PositionWeighted.
Reviewed By: kennyhorror
Differential Revision: D6771612
fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
Summary:
so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features.
- it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc.
Reviewed By: chocjy
Differential Revision: D6369270
fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.
I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.
There are some details such as making the initializer more structured
that I need to finalize.
Reviewed By: kennyhorror
Differential Revision: D5435132
fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id.
Reviewed By: kittipatv
Differential Revision: D5421667
fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5
Summary:
Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not
have this requirements.
Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet.
Differential Revision: D5019165
fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e
Summary: Somehow, feed-non-ranking training data usually have this type of column. Add option to support it.
Reviewed By: xianjiec, kennyhorror
Differential Revision: D4773960
fbshipit-source-id: 5a7ef4618a070e04f3cd8ddfcbf2b7441c00d92d
Summary:
Add distributed training to dper2 and keep the dper1 working.
* Created a ModelDelegator to wrap ModelHelper and LayerModelHelper to mitigate the difference.
* To get the average length for sparse feature, I extracted some information in feature_processor. There should be some better way to do it after we have new compute_meta.
* metric right now only runs on the first trainer.
* The model is saved correctly for evaluation. But I'm still not sure how to handle the weights for adagrad.
Reviewed By: kennyhorror
Differential Revision: D4767745
fbshipit-source-id: 0559d264827a7fd9327071e8367d1e84a936bea9
Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model.
Differential Revision: D4678413
fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db
Summary:
previously fp16 type was supported in SparseLengthsSum operator, now it
works in all other segment operator as well.
Reviewed By: dzhulgakov
Differential Revision: D4624312
fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493
Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER.
Reviewed By: dzhulgakov
Differential Revision: D4587560
fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc
Summary:
First part of adding half-floats support to DPER 2.0. Let's add an option use_half_floats to enable converting some weights of the model from fp32 to fp16 before saving it to predictor models parts. For now it's for SparseLookup layer's embeddings. All conversion is done after training is finished and saved models are ready to be used on remote predictors as-is (they will be stored compacted in memory). New fp16 blobs are saved to the model instead of original ones, under the same names, so we don't modify MetaNetDef at all.
Next steps:
1) support on delivery side -- operators working with these blobs should support both float and float16 input types
2) benchmark performance to make sure there is no regression
a) of serialization
b) of delivery
3) support realtime training (I'm thinking about adding new pre-publishing net which will be executed each time the realtime trainer stops to publish a new snapshot)
Depends on D4567304
Reviewed By: kennyhorror
Differential Revision: D4571710
fbshipit-source-id: 19967a17d3bd84878d66e8c0ed8c5342bf38d979
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.
The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.
This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.
Reviewed By: kennyhorror
Differential Revision: D4555423
fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187