pytorch/caffe2/python
Le Fang ac4913ee62 support both regularizable and sofmax re-weighting on sparse features in dot product (#22176)
Summary:
In order to select more important features in dot product among a list of candidate sparse features, we can assign one learnable weight on each feature, reweight each feature by multiplying the weight onto its embedding before dot product. We finally select features based on the weight magnitude after training.

We can perform L1 and/or L2 regularization on the weights. To summarize, the weights tend to shrink their values (avoiding overfitting) due to L2 regularization, and some weights will vanish to zero as L1. To avoid sparse feature embedding being ignored due to early collapse of weights, a piece lr warm up policy is used in optimizing regularization term, such that regularization is weak at first stage and gets stronger afterwards (a small lr constant in iters less than threshold 1, a medium lr constant in stage 2, and a final reasonable large lr constant in all iters after threshold 2). The features with nonzero and relatively large weights (in absolute value) will be selected for the module.

We can also apply softmax on the original weights to make it sum to 1. We can even boosting the softmaxed weights by multiply the number of softmax components, which essentially make them sum to the number of softmax components and avergae to 1. In this idea, all the weights are positive and sum to a constant. Regularization is not a must since we can count on the competition between softmax weights themselves to achieve reasonable re-weighting. We expect those weights be more dense, comparing with sparse ones from L1 regularization and we can select features based on top K weights.

Overall, we aim to demonstrate the selected feature set outperform current v0 feature set in experiments. Special acknowledgement goes to Shouyuan Chen, who initiated the work of regularizable weighting.

 ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22176

The diff will export updates to Github repository, as stated below.

{F162787228}

Basically, the updates on the files are summarized as below:

- adding logger messages
`caffe2/python/layer_model_helper.py`
- add ElasticNet regularizer, which combines both L1 and L2 regularization
`caffe2/python/regularizer.py`
- implement piecewarmup, specifically warm up with three constant pieces
`caffe2/sgd/learning_rate_functors.h, caffe2/sgd/learning_rate_op.cc, caffe2/sgd/learning_rate_op.h`

Differential Revision: D15923430

fbshipit-source-id: ee18902cb88c23b1b7b367cc727d690a21e4cda9
2019-06-24 21:27:33 -07:00
..
docs Fix several DeprecationWarning: invalid escape sequence (#15733) 2019-01-05 08:53:35 -08:00
examples Adding models to jenkins benchmark script (#21010) 2019-05-30 15:17:40 -07:00
helpers Add elementwise_affine for LayerNormGradientOp (#19982) 2019-05-03 15:33:46 -07:00
ideep implement transpose operator for MKLDNN (#19955) 2019-06-11 01:55:13 -07:00
layers Add hashing to bucket-weighted pooling (#20673) 2019-06-20 15:12:36 -07:00
mint re-enable copy of python files, but be careful that the copy is only … (#14982) 2018-12-11 16:54:08 -08:00
mkl implement operators for DNNLOWP (#18656) 2019-04-10 12:04:39 -07:00
modeling Support plot norm of specific embeddings of a LUT in diagnose_options (#19809) 2019-05-18 01:08:45 -07:00
models Adding ShufflenetV2 to caffe2's benchmark suite. (#20180) 2019-05-23 20:40:17 -07:00
onnx Fix: convert Onnx DynamicSlice operator with 4 inputs to caffe2 fa… (#20846) 2019-06-19 00:09:15 -07:00
operator_test Fix spelling errors (#21665) 2019-06-13 15:21:55 -07:00
predictor add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#18257) 2019-03-22 00:19:59 -07:00
rnn Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
serialized_test Update ROCm 2.4 (#20253) 2019-05-08 09:35:40 -07:00
test Enforce import order to make protobuf cpp implementation in python work (#18560) 2019-04-03 13:17:08 -07:00
trt Skip tests if C2/ONNX models cannot be read (#18494) 2019-03-27 11:21:44 -07:00
__init__.py Revert #17191 and #17215 that no longer apply on Windows (#17567) 2019-03-01 10:37:27 -08:00
_import_c_extension.py Enforce import order to make protobuf cpp implementation in python work (#18560) 2019-04-03 13:17:08 -07:00
allcompare_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
attention.py [Caffe2] Update elementwise ops to support numpy style boradcast (#8070) 2018-06-05 15:49:16 -07:00
benchmark_generator.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
binarysize.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
brew_test.py Move tanh function to math (#9328) 2018-07-11 13:59:50 -07:00
brew.py Testing for folded conv_bn_relu (#19298) 2019-04-16 19:04:06 -07:00
build.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
cached_reader.py Pass loop_over optional parameter for cached reader properly. (#21929) 2019-06-19 18:15:32 -07:00
caffe_translator_test.py Fix several ResourceWarning: unclosed file (#15746) 2019-01-09 15:36:53 -08:00
caffe_translator.py Fix several ResourceWarning: unclosed file (#15746) 2019-01-09 15:36:53 -08:00
checkpoint_test.py Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164) 2018-08-31 22:25:57 -07:00
checkpoint.py Remove setting logger level in caffe2.python.checkpoint (#19803) 2019-05-10 07:00:58 -07:00
CMakeLists.txt Fix CMakeLists.txt for Int8 python bindings (#15047) 2018-12-11 10:48:47 -08:00
cnn.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
compatibility.py migrating deprecated calls without abc module for containers (#11515) 2018-09-13 15:09:22 -07:00
context_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
context.py Resolve name conflict of ContextManager (#7244) 2018-06-22 00:41:51 -04:00
control_ops_grad_test.py fix auto grad summing for IfOp where intermediate output needs renaming (#14772) 2018-12-09 08:26:46 -08:00
control_ops_grad.py DeviceScope support for CUDA and testing (#15357) 2019-01-30 18:42:12 -08:00
control_ops_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
control_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
control.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
convert_test.py New serialization format (#12384) 2018-10-16 16:36:58 -07:00
convert.py New serialization format (#12384) 2018-10-16 16:36:58 -07:00
convnet_benchmarks_test.py Skip convnets benchmark in rocm CI (#17331) 2019-02-20 21:12:24 -08:00
convnet_benchmarks.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
core_gradients_test.py add extra info for the auto gen sum ops 2019-03-27 14:56:32 -07:00
core_test.py Extend Net.RunAllOnGPU() to support RecurrentNetwork op (#15713) 2019-02-08 15:48:42 -08:00
core.py add extra info for the auto gen sum ops 2019-03-27 14:56:32 -07:00
crf_predict.py Move crf in caffe2 from fb to oss (#12200) 2018-10-01 18:31:41 -07:00
crf_viterbi_test.py Move crf in caffe2 from fb to oss (#12200) 2018-10-01 18:31:41 -07:00
crf.py Productionize CRF layer in PyText (#10362) 2018-08-22 00:25:26 -07:00
data_parallel_model_test.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
data_parallel_model.py handle scenario when GPU support is not available and p2p_access_pattern is empty (#17974) 2019-03-18 23:11:54 -07:00
data_workers_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
data_workers.py Fixed log message (#10874) 2018-09-05 09:55:52 -07:00
dataio_test.py Pass loop_over optional parameter for cached reader properly. (#21929) 2019-06-19 18:15:32 -07:00
dataio.py Rearrange stopping condition in CompositeReader (#20062) 2019-05-06 15:06:32 -07:00
dataset.py Update from facebook (#7855) 2018-05-29 11:38:02 -07:00
db_file_reader.py Update from Facebook (#8887) 2018-06-26 14:55:48 -07:00
db_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
device_checker.py Update from facebook (#7451) 2018-05-10 23:14:27 -07:00
dlpack.h Upgrade DLPack 2018-11-12 15:59:46 -08:00
dyndep.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
embedding_generation_benchmark.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
experiment_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
extension_loader.py Completely remove build_aten and use_aten (#10469) 2018-08-20 20:26:42 -07:00
filler_test.py caffe2 - Expose tensor filler util to Python (#18886) 2019-04-08 11:54:10 -07:00
functional_test.py Add support for specifying device_option in Functional (#9619) 2018-07-24 14:41:59 -07:00
functional.py Caffe2 Functional enforcing inplace output (#10797) 2018-08-23 22:42:47 -07:00
fused_8bit_rowwise_conversion_ops_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
gradient_check_test.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
gradient_checker.py Adding gradient to Boolean Mask operator (#21423) 2019-06-06 20:48:47 -07:00
gru_cell.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
hip_test_util.py Make CUDNN an alias of MIOPEN for HIP ops (#12278) 2018-10-24 17:07:31 -07:00
hsm_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
hypothesis_test_util.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
hypothesis_test.py Re-enable test_dag_net_forking on ROCm (#21013) 2019-05-28 12:12:53 -07:00
ideep_test_util.py [feature request] [Caffe2] Enable MKLDNN support for inference (#6699) 2018-04-22 21:58:14 -07:00
layer_model_helper.py support both regularizable and sofmax re-weighting on sparse features in dot product (#22176) 2019-06-24 21:27:33 -07:00
layer_model_instantiator.py [caffe2] Fbcode to GitHub sync (#6208) 2018-04-02 16:35:27 -07:00
layer_parameter_sharing_test.py Add validator for optimizers when parameters are shared 2019-04-17 21:10:38 -07:00
layer_test_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
layers_test.py Add new regression loss function type to FBLearner (#21080) 2019-06-17 17:43:00 -07:00
lengths_reducer_fused_8bit_rowwise_ops_test.py make the threshold for acurracy more precise (#17194) 2019-02-20 13:14:11 -08:00
lengths_reducer_rowwise_8bit_ops_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
lstm_benchmark.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
memonger_test.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
memonger.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
mkl_test_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
model_device_test.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
model_helper_test.py keep net type info when generating model complete net (#11032) 2018-09-04 21:10:06 -07:00
model_helper.py Remove the identical if branch (#18019) 2019-03-15 13:14:26 -07:00
modifier_context.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
mpi_python.cc Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
muji_test.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
muji.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
net_builder_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
net_builder.py Update from Facebook (#6692) 2018-04-17 23:36:40 -07:00
net_drawer.py Allow customization of blob node in net_drawer (#16915) 2019-02-12 15:02:50 -08:00
net_printer_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
net_printer.py Fix spelling errors (#21665) 2019-06-13 15:21:55 -07:00
nomnigraph_test.py nomnigraph - support subgraph visualization (#13795) 2018-11-16 08:19:20 -08:00
nomnigraph_transformations_test.py Add transpose network pass (#13437) 2018-11-01 14:27:07 -07:00
nomnigraph_transformations.py Add transpose network pass (#13437) 2018-11-01 14:27:07 -07:00
nomnigraph.py createUniqueDataNode 2018-10-31 11:16:38 -07:00
normalizer_context.py Update from Facebook (#8887) 2018-06-26 14:55:48 -07:00
normalizer_test.py Update from Facebook (#8887) 2018-06-26 14:55:48 -07:00
normalizer.py Enable alternative LayerNorm impl in FisherGan (#12178) 2018-10-11 17:36:11 -07:00
numa_benchmark.py Revert D13205604: Move numa.{h, cc} to c10/util 2018-12-07 10:01:25 -08:00
numa_test.py Move numa.{h, cc} to c10/util (#15024) 2018-12-12 12:21:10 -08:00
observer_test.py Fix RNN scoping situation 2018-02-07 17:35:29 -08:00
operator_fp_exceptions_test.py Caffe2 - Add flag to fails if float point exceptions is detected in operator runs (#18040) 2019-03-16 12:28:05 -07:00
optimizer_context.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
optimizer_test_util.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
optimizer_test.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
optimizer.py Add validator for optimizers when parameters are shared 2019-04-17 21:10:38 -07:00
parallel_workers_test.py Add event and event_counter columns to caffe2_usage_tracer table (#21739) 2019-06-13 12:06:02 -07:00
parallel_workers.py Update from facebook (#7696) 2018-05-19 23:10:48 -07:00
parallelize_bmuf_distributed_test.py Unify gpu_support variable in python tests (#16748) 2019-02-07 00:29:51 -08:00
pipeline_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
pipeline.py SNNTest with Data Preproc Service (#11707) 2018-09-17 21:25:49 -07:00
predictor_constants.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
pybind_state_dlpack.cc Upgrade DLPack 2018-11-12 15:59:46 -08:00
pybind_state_dlpack.h Remove PythonOp non-CPU path and PytorchOp (#15417) 2019-01-02 16:36:37 -08:00
pybind_state_gpu.cc add simple memory analyzer and log warning if GPU underutilized (#21024) 2019-05-28 19:58:54 -07:00
pybind_state_hip.cc Fix get_gpu_memory_info in non-cuda builds (#21054) 2019-05-28 23:05:15 -07:00
pybind_state_ideep.cc Upgrade mkldnn-bridge for dnnlowp support (#16308) 2019-04-03 12:47:17 -07:00
pybind_state_int8.cc Renaming meta() to dtype() - 2/2 (#13334) 2018-10-30 18:24:30 -07:00
pybind_state_nomni.cc nomnigraph - support subgraph visualization (#13795) 2018-11-16 08:19:20 -08:00
pybind_state_registry.cc Move registry fully to c10 (#12077) 2018-09-27 03:09:54 -07:00
pybind_state_registry.h Move registry fully to c10 (#12077) 2018-09-27 03:09:54 -07:00
pybind_state.cc Fix incorrect usage of __HIP_PLATFORM_HCC__ (#21757) 2019-06-18 18:56:32 -07:00
pybind_state.h Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623) 2019-03-06 10:48:15 -08:00
python_op_test.py Clean up a couple of items in the C2 test scaffolding (WIP) (#7847) 2018-11-07 09:16:13 -08:00
queue_util.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
record_queue.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
recurrent.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
regularizer_context.py Update from Facebook (#8887) 2018-06-26 14:55:48 -07:00
regularizer_test.py Add GroupL1Norm regularizer (#9115) 2018-07-06 13:26:09 -07:00
regularizer.py support both regularizable and sofmax re-weighting on sparse features in dot product (#22176) 2019-06-24 21:27:33 -07:00
rnn_cell.py Unify cuda and hip device types in Caffe2 python front end (#14221) 2018-11-29 14:00:16 -08:00
schema_test.py Make the exception raised from "numpy.dtype(numpy.void, (INT,))" less cryptic (#16809) 2019-02-08 16:46:50 -08:00
schema.py Give clear error message when attempting to merge struct which can't be merged. 2019-05-10 07:01:01 -07:00
scope_test.py Add EmptyNameScope to allow you jump out from current scope. (#14631) 2018-12-12 01:39:50 -08:00
scope.py Add EmptyNameScope to allow you jump out from current scope. (#14631) 2018-12-12 01:39:50 -08:00
session_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
session.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
sparse_to_dense_mask_test.py Increase static tolerance for negative feature ids 2019-05-20 19:09:22 -07:00
sparse_to_dense_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
task_test.py caffe2/python/task: added __repr__ methods to all task definitions (#15250) 2018-12-17 16:02:16 -08:00
task.py A trivial typo fix in caffe2.python (#15907) 2019-01-17 04:57:34 -08:00
test_util.py caffe2 - support flaky operator tests for caffe2 build (#18155) 2019-03-25 16:58:34 -07:00
text_file_reader.py Create Node2Vec ModuleKeeper 2019-04-01 10:36:23 -07:00
timeout_guard.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
toy_regression_test.py Enable junk fill for the default CPU allocator (#13377) 2018-11-08 00:02:37 -08:00
transformations_test.py Remove sinkMaxPool transformation (#17694) 2019-03-12 20:10:46 -07:00
transformations.py support pre-convert filter format for mkldnn training mode and change 'OptimizeForIdeep' to 'OptimizeForMkldnn' (#15171) 2019-03-29 19:00:48 -07:00
tt_core_test.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
tt_core.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
utils_test.py Convert Arguments to dictionary (#13436) 2018-11-01 14:27:05 -07:00
utils.py Query caffe2 operator stats for detailed execution info (#20924) 2019-06-13 23:41:04 -07:00
visualize.py Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
workspace_test.py Remove Variable::Impl and DifferentiableViewImpl (#17072) 2019-05-23 21:09:04 -07:00
workspace.py Fix get_gpu_memory_info in non-cuda builds (#21054) 2019-05-28 23:05:15 -07:00