pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xue Feng	0e9b0cf779	add error msg in fc input_record Summary: as titled Reviewed By: xianjiec Differential Revision: D6787879 fbshipit-source-id: 4bbdd11455480b25fa18121fa4527a9f0a03addc	2018-01-23 14:48:15 -08:00
Xianjie Chen	76a141f016	add error msg in get_key Summary: as title Differential Revision: D6782896 fbshipit-source-id: bd29f6d085e56f51deb4bf6ad81771787fd85a5a	2018-01-23 11:04:05 -08:00
Dániel Simig	2dd79eb53a	Visualize distribution of activation functions Summary: This is a first attempt at completing bootcamp task T24449916. This diff contains 3 major changes: 1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics 2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object 3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook Reviewed By: kennyhorror Differential Revision: D6671273 fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b	2018-01-23 10:36:40 -08:00
Lin Yang	8e0177255e	Test for PositionWeighted Summary: add Test for SparseLookup with PositionWeighted. Reviewed By: kennyhorror Differential Revision: D6771612 fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9	2018-01-22 19:20:46 -08:00
Yan Shang	41bb662d96	add dense regularization Reviewed By: xianjiec Differential Revision: D5617571 fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8	2018-01-08 13:03:17 -08:00
Tiangao Gou	bc50510016	use numerically stable version of BatchLRLoss Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py. Reviewed By: xianjiec Differential Revision: D6643074 fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12	2018-01-02 13:18:36 -08:00
Xianjie Chen	7a5200b450	print exception in layers Summary: as desc Reviewed By: chocjy Differential Revision: D6577301 fbshipit-source-id: 3c2d08a05f6fd1d6771019347e6dec4dd711a653	2017-12-15 12:12:28 -08:00
Jiyan Yang	d38a9bb4ec	Fix dot processor with only one sparse feature and no dense feature Summary: As titled. This will fail with the message: File "/mnt/xarfuse/uid-30088/f8742a88-seed-a26ddfbc-49aa-4c5f-9e08-91909f4775da-ns-4026532692/caffe2/python/layers/concat.py", line 52, in __init__ "Concat expects that limited dimensions of the input tensor" This is because the output scalar of the pairwise_dot_product layer won't contain shape information if output_dim is 1. https://fburl.com/1m9r3ayp This diff is fix it. Reviewed By: xianjiec Differential Revision: D6565930 fbshipit-source-id: 181181232065ef3fdfc825aa25d2714affbe6b8d	2017-12-14 13:05:17 -08:00
Qichao Que	234591a809	Support regression with output transform in MTML for feed Summary: changes on metrics and mtml. Differential Revision: D6457175 fbshipit-source-id: 1a162c519191f290e8e919cc7fe978f502ec2840	2017-12-11 17:20:20 -08:00
Yan Shang	cf07820849	Enable SparseLengthsMean Differential Revision: D6445834 fbshipit-source-id: 5cbc95e6975b2447dc82dbe293d0ddd9adf6b5a3	2017-11-30 16:04:38 -08:00
Xue Feng	0c588a500b	Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits for better numerical stability Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient. Reviewed By: xianjiec Differential Revision: D6305455 fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770	2017-11-30 14:04:36 -08:00
Ellie Wen	fc3f88d8a4	higher order interaction of embeddings Summary: Get higher order interaction of embeddings, similar to cross net but applied in the embedding level. Formula: e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b where l means the l-th layer of this higher order net, i means the i-th embedding in the list. Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor. Differential Revision: D6244001 fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8	2017-11-30 08:51:09 -08:00
Bingjun Sun	7e9724142a	batched layer parameter loading for model initialization from an existing model Summary: Problem: when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status. Restriction: 1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter. 2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them. Solution: Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model). Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough. Reviewed By: xianjiec Differential Revision: D6397607 fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da	2017-11-29 22:17:51 -08:00
Andrey Malevich	b766335753	Revert D6403523: [Part 2] Support regression with output transform in MTML for feed. Summary: This reverts commit faa0aab1227a27286b617e8e25adfbab3a349d2c bypass-lint Differential Revision: D6403523 fbshipit-source-id: eb43f348b09f2abcc52e101f43b0b9cc42a48ffb	2017-11-29 21:47:01 -08:00
Qichao Que	c9e181f50f	Support regression with output transform in MTML for feed. Summary: Support regression with output transform in MTML for feed. Differential Revision: D6403523 fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c	2017-11-29 15:47:19 -08:00
Xianjie Chen	5250d7fd11	simplify logic for weighted pooling using id score list Summary: so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features. - it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc. Reviewed By: chocjy Differential Revision: D6369270 fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4	2017-11-22 17:32:04 -08:00
Yan Shang	dcaaf51100	Support /sqrt(n) pooling Differential Revision: D6378584 fbshipit-source-id: 3c6606c4e71afbd31dbb97ceeac38dfbe7b40090	2017-11-21 09:04:02 -08:00
Xue Feng	f0306c12ff	add Mean Pooling distributed support Reviewed By: dragonxlwang Differential Revision: D6114111 fbshipit-source-id: bc0a79a4455e490bdfaa1d5d6d77badfacd2375c	2017-11-14 17:30:31 -08:00
Xianjie Chen	ae5673741b	add option to do simple modulo Summary: as desc. Differential Revision: D6240061 fbshipit-source-id: 814a541a3e7f09ebbe2df63fd9202312e9f4c8d4	2017-11-10 13:49:07 -08:00
Anshul Verma	4761b32f96	make use of the average length of sparse features for init Summary: Ability to use average length of sparse feature to initialize weights. Based on experiments, it turns out that this allows a model to converge faster. More results of the experiment -- https://fb.quip.com/VfraAXNFWhSg Reviewed By: xianjiec Differential Revision: D6092437 fbshipit-source-id: d979be7d755719ff297b999f73cba0671e267853	2017-11-08 07:31:47 -08:00
Ellie Wen	84b76a0712	fix shape info in concat layer Summary: The output shape info is incorrect, e.g. if we have 4 embeddings with dim size 32, the actual shape is (4, 32), but the previous implementation in concat layer will give us (128, 1). This bug doesn't affect the dot products calculation because the actual shape of the blob is still (4, 32) in concat_split_op Differential Revision: D6264793 fbshipit-source-id: 82995e83a8c859cbd15617ff7850a35b30b453b6	2017-11-07 21:08:39 -08:00
Xianjie Chen	1b5c843a9c	cleaner logic on sparse feature hashing Reviewed By: kennyhorror Differential Revision: D6195525 fbshipit-source-id: f687ac3d4914c3dbb0d35679e3a3d3a64a71ac53	2017-11-03 07:27:45 -07:00
Jiyan Yang	ee3baa2ed4	Add shape checks and print more info in parameter sharing Summary: As titled. Reviewed By: kittipatv Differential Revision: D6145747 fbshipit-source-id: 39a212bb6bebbbf3164cade2f95db22ddb2d2c87	2017-10-27 01:22:06 -07:00
Kittipat Virochsiri	879e39ea5c	Distill loss with SigmoidCrossEntropyWithLogits Summary: Sigmoid + CrossEntropy has numerical stability issue. The gradient of sigmoid is `dx = dy * y * (1-y)`. When `label=0` and `x` is large, `1-y` could be round to (near) 0 and we loss `dx`. Switch to `SigmoidCrossEntropyWithLogits` solve the issue because the gradient is not dependent of `y`. Reviewed By: chocjy Differential Revision: D6086950 fbshipit-source-id: f990ae726802aa5c56fa62cf5e23f2e61ee047fa	2017-10-26 15:18:34 -07:00
Yan Shang	39359afc84	Add rank loss for retrieval models with random negative sample Summary: In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework. The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss. (Note: this ignores all push blocking failures!) Reviewed By: kittipatv Differential Revision: D5854486 fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a	2017-10-25 16:19:41 -07:00
Huazhong Ning	f7ad13694c	support model init Summary: a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters. This diff fixes this issue and also support model init for ads-intent project Reviewed By: dragonxlwang Differential Revision: D5991291 fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7	2017-10-19 15:56:37 -07:00
Bangsheng Tang	7b30436201	remove Alias in SparseFeatureHash Summary: remove Alias in SparseFeatureHash Reviewed By: kennyhorror Differential Revision: D6094663 fbshipit-source-id: f313aeb17bf6cfdacae62b2c1ad6b4175d0882dd	2017-10-19 13:24:20 -07:00
Dmytro Dzhulgakov	2972a6ca02	Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger" Summary: This reverts commit 95c634872ac02be721257169e38c8fead04cd66b bypass-lint Differential Revision: D6026557 fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776	2017-10-12 20:21:52 -07:00
Artem Volkhin	5b10ad255b	Use EMBEDDING feature type instead of FLOAT_TENSOR Summary: create a special type for embeddings Differential Revision: D5997808 fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454	2017-10-11 13:50:03 -07:00
Luke Yeager	75bece6ede	Fix "No handlers could be found for logger" Summary: Closes https://github.com/caffe2/caffe2/pull/1316 Differential Revision: D6026557 Pulled By: Yangqing fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b	2017-10-10 22:32:13 -07:00
Xianjie Chen	9455eda57b	cast distill loss teacher label to float Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float. Reviewed By: kittipatv Differential Revision: D6018163 fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4	2017-10-10 01:16:07 -07:00
Kittipat Virochsiri	d5f60b240d	Fix distill loss Summary: The layer should also apply to evaluation as it's needed for feature importance run. Reviewed By: xianjiec Differential Revision: D6016125 fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8	2017-10-09 18:17:31 -07:00
Artem Volkhin	fb8a7679cc	preprocs for embeddings Summary: embeddings Differential Revision: D5888420 fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421	2017-10-04 22:18:21 -07:00
Hassan Eslami	8e309c014c	Tagging sparse parameters Summary: This is the first step on DPER side to use net transformation step (`parallelize_net`). So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built. Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`. Reviewed By: azzolini Differential Revision: D5902733 fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a	2017-10-04 18:46:48 -07:00
Hassan Eslami	7fc7756487	Refactor param initialization from model manipulation to layers logic Summary: This diff refactors the parameter initialization logic from model manipulation to layers Reviewed By: azzolini Differential Revision: D5920225 fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46	2017-10-02 22:08:40 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Xiaolong Wang	642dea487d	update inline comment Summary: as desc Reviewed By: kennyhorror Differential Revision: D5930526 fbshipit-source-id: 510388fd66b487410ff748a9e6f546a8ce27bc1d	2017-09-28 10:17:13 -07:00
Huazhong Ning	808c9e3e70	fix a small typo error in sparse_lookup Summary: as title Reviewed By: kittipatv Differential Revision: D5908455 fbshipit-source-id: e7c66e84a27273156d66dfd043e9cfd9b0ab9a98	2017-09-25 21:46:56 -07:00
Kittipat Virochsiri	5aac6a2e06	Make LastNWindowCollector thread-safe Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data. Reviewed By: chocjy Differential Revision: D5858335 fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e	2017-09-22 09:48:30 -07:00
Anshul Verma	a340d141de	Check num_elements > num_samples in UniformSampling Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time. Reviewed By: kittipatv Differential Revision: D5858085 fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891	2017-09-21 16:37:20 -07:00
Kittipat Virochsiri	1b059f4c98	Add option to ignore parameter initialization Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization. Reviewed By: chocjy Differential Revision: D5870362 fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a	2017-09-20 12:03:22 -07:00
Yan Shang	6a883d1bc0	Remove dot_product layer Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer. Reviewed By: kittipatv Differential Revision: D5783303 fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b	2017-09-07 18:48:30 -07:00
Xianjie Chen	ec713d437d	make sure the output of sparse lookup layer is float Summary: currently, if reduer=Nonoe, the output if fp16 Differential Revision: D5773560 fbshipit-source-id: 24d7e5fae366d70352582e9a1ee14c7613753b7a	2017-09-07 17:47:39 -07:00
Dmitrii Podoprikhin	c7684e3b27	Rowwise quantization Reviewed By: kennyhorror Differential Revision: D5753626 fbshipit-source-id: 680c627a81658bcd653feab68e7040db0cb7a185	2017-09-06 10:19:38 -07:00
Long Jin	3faeb621d3	support id_score_list for Feed Reviewed By: xianjiec Differential Revision: D5624894 fbshipit-source-id: 1b2caba9ffcce68f346020485cb1f4edb01ca5e7	2017-08-24 00:32:05 -07:00
Jeonghee Yi	98da4e3a04	pairwise dot product with dot_groups support Summary: extending pairwise dot-product only between dot_groups Differential Revision: D5527060 fbshipit-source-id: be5d3178c332e122853a2f9d8da12a880608b0ab	2017-08-23 15:23:36 -07:00
Jeonghee Yi	d675c101e9	extend pairwise dot product for non-equal x & y dimension size Summary: extend pairwise dot product for different number of embeddings on x & y dimensions Differential Revision: D5663553 fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6	2017-08-23 02:08:20 -07:00
Badri Narayan Bhaskar	9507cae9e0	Create MergeIdListsLayer Summary: We create a layer for MergeIdListsOp Differential Revision: D5531348 fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25	2017-08-22 17:00:55 -07:00
Kittipat Virochsiri	0e5fcc7ca2	Make Tags a decorator as well Summary: In case the whole function should be wrapped in certain context, this make it less ugly. Reviewed By: xianjiec Differential Revision: D5665253 fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a	2017-08-22 11:01:14 -07:00
Yan Shang	57c93435e3	Dedup name in functional layer Summary: Before this fix, a functional layer name can appear several time in a blob and causes confusion. This diff fix this issue. Reviewed By: kittipatv Differential Revision: D5641354 fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2	2017-08-17 17:50:34 -07:00

1 2 3 4

160 Commits