pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xiaolong Wang	4bb73b8361	[GanH] Weighting Layers: Adaptive/Constant/Homotopy use case: to weight multiple losses (real values) as a single composite loss for optimization	2018-03-20 13:34:22 -07:00
sf-wind	602a09dde7	Update caffe2 from facebook 4f527ef46abf (#2234 ) * [GanH]: two_task_discriminator as titled and adding label smooth * [Dper2] Simplified UI options needed for blob magnitude visualization * [GanH]: fix tags as titled * Added type and shape inference for GatherRange operator This helps with type / shape inference when using this operator in layers. Also just a nice to have in general. * Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException. * Bind Gloo IoException to IoError in Python Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind. * [GanH]: add label smoothing to softmax with loss as titled * [C2] Enable LARS in Adagrad and hook it to DPER * [DPER] Don't pass LayerModelHelper in create_trainer_nodes Since we're planning to get rid of it eventually and I want to get access to NetDef only interface ASAP - I'm looking towards removing all references to LMH, where we don't really need them. * fix bugs in LambdaRankNdcgOp the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log. * Restrict thread pool on iOS to only big cores Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them. However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android. * Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine * make clang happy and get fewer warnings make clang happy and get fewer warnings * [Personalization] Support add_output_schema() in layer_model_helper Problem: Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer. Solution: For flexibility, we want to add fields to output_schema incrementally. Plan: Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema. Callsite: The add_output_schema() should be called instead at https://fburl.com/efth5zer Reference: The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh	2018-03-12 12:22:59 -07:00
Orion Reblitz-Richardson	c55a642d83	[c2] update SparseFeatureHash layer The diff makes following changes for this layer: copy length blob; add nameScope for output schema; add layer tests	2018-02-26 10:26:25 -08:00
Lin Yang	cec7003190	only enable FloatToHalf test for GPU Reviewed By: bddppq Differential Revision: D6945312 fbshipit-source-id: 9550a9607c0daec6783ce63d3c9f082ff27b0303	2018-02-08 17:48:47 -08:00
Lin Yang	27b9b7b15a	Make TypeInference work for HalfToFloat & FloatToHalf. Summary: add missing type mapping. Reviewed By: kennyhorror Differential Revision: D6940574 fbshipit-source-id: b70cea4ce2e519cb3e72d0482a38f50dbb968b4a	2018-02-08 15:33:43 -08:00
Lin Yang	95626737d0	enforce global_constant name should be a string Reviewed By: kennyhorror Differential Revision: D6880114 fbshipit-source-id: 2c9bd27b01cedb469f19843163b04a613fda5904	2018-02-04 01:02:27 -08:00
Lin Yang	252211b001	testPairwiseDotProduct Summary: as title. Reviewed By: kennyhorror Differential Revision: D6793829 fbshipit-source-id: f803e0400635ca37184f1dd5bb711bfe0e4bea21	2018-01-26 11:33:08 -08:00
Lin Yang	8e0177255e	Test for PositionWeighted Summary: add Test for SparseLookup with PositionWeighted. Reviewed By: kennyhorror Differential Revision: D6771612 fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9	2018-01-22 19:20:46 -08:00
Lin Yang	4ea6e6a556	testSparseLookup Summary: add basic test for SparseLookup Reviewed By: kennyhorror Differential Revision: D6749915 fbshipit-source-id: f97af785e4f89f36788a992843066fd1ec2b75a9	2018-01-19 09:27:20 -08:00
Tiangao Gou	bc50510016	use numerically stable version of BatchLRLoss Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py. Reviewed By: xianjiec Differential Revision: D6643074 fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12	2018-01-02 13:18:36 -08:00
Yan Shang	39359afc84	Add rank loss for retrieval models with random negative sample Summary: In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework. The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss. (Note: this ignores all push blocking failures!) Reviewed By: kittipatv Differential Revision: D5854486 fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a	2017-10-25 16:19:41 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	3cd0003bf6	fix layers_test: atol should almost always accompany rtol Summary: TSIA Reviewed By: chocjy Differential Revision: D5898129 fbshipit-source-id: f49e8478f79d9df5b59a26287fff7fc5417aac6e	2017-09-22 23:31:01 -07:00
Kittipat Virochsiri	5aac6a2e06	Make LastNWindowCollector thread-safe Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data. Reviewed By: chocjy Differential Revision: D5858335 fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e	2017-09-22 09:48:30 -07:00
Yangqing Jia	9b2c5501b8	Fix Windows build Summary: After this, windows should be all green. Closes https://github.com/caffe2/caffe2/pull/1228 Reviewed By: bwasti Differential Revision: D5888328 Pulled By: Yangqing fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34	2017-09-21 20:13:15 -07:00
Anshul Verma	a340d141de	Check num_elements > num_samples in UniformSampling Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time. Reviewed By: kittipatv Differential Revision: D5858085 fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891	2017-09-21 16:37:20 -07:00
Badri Narayan Bhaskar	9507cae9e0	Create MergeIdListsLayer Summary: We create a layer for MergeIdListsOp Differential Revision: D5531348 fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25	2017-08-22 17:00:55 -07:00
Yan Shang	57c93435e3	Dedup name in functional layer Summary: Before this fix, a functional layer name can appear several time in a blob and causes confusion. This diff fix this issue. Reviewed By: kittipatv Differential Revision: D5641354 fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2	2017-08-17 17:50:34 -07:00
Long Jin	ef64a4f6b2	Add conv layer and layer tests Reviewed By: xianjiec Differential Revision: D5569206 fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d	2017-08-08 10:57:43 -07:00
Jacqueline Xu	a1bf14d8e6	Building new randomized sparse nn model Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable Reviewed By: chocjy Differential Revision: D5416489 fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d	2017-08-07 12:48:58 -07:00
Jiyan Yang	4b80ff89e2	Use softsign op for s=0 in arc-cosine feature map Summary: The current implementation for s=0 doesn't support backward pass. Switching to using pow op instead as a temporary solution. Reviewed By: jackielxu Differential Revision: D5551742 fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3	2017-08-03 23:35:11 -07:00
Jacqueline Xu	13569c9aa0	Fixing semi-random layer model for multi-layer models Summary: Updated the semi-random layer model for multi-layer models using semi-random layers. Notable changes: - Input and outputs for the semi-random layer is now a Struct with "full" and "random" components - Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer) Reviewed By: chocjy Differential Revision: D5496034 fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04	2017-07-27 15:25:19 -07:00
Jacqueline Xu	9bec54bbf1	Modify arc cosine feature map and semi random layers to initialize parameters as global constants Summary: The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer. I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished. In this diff, I've: - Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers - Updated unit tests - Ran an end-to-end test, enabling multiple readers to test the fixed issue Reviewed By: chocjy Differential Revision: D5483372 fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd	2017-07-26 16:37:00 -07:00
Honghao Wei	290acab2c7	implement drelu and unittest Summary: In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details. To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px. To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI. I also add units test for DRelu. We check the shape of output and also do the numeric unit tests. For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer. Reviewed By: chocjy Differential Revision: D5341464 fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630	2017-07-20 11:50:08 -07:00
Honghao Wei	b68adec7bb	adding model loss logic Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'. Reviewed By: xianjiec Differential Revision: D5399056 fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b	2017-07-14 16:25:23 -07:00
Jacqueline Xu	2aa8fc7e8d	Implementing Semi-Random Features Layer Summary: - (Split diff from Arc Cosine) - Implemented [[ https://arxiv.org/pdf/1702.08882.pdf \| Semi-Random Features ]] Layer - Created a buck unit test for SRF Layer Reviewed By: chocjy Differential Revision: D5374803 fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae	2017-07-14 13:15:50 -07:00
Jiyan Yang	043640c3eb	Return top K classes Reviewed By: kittipatv Differential Revision: D5363481 fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af	2017-07-13 00:20:00 -07:00
Tao Wu	02aa5ad9fb	make functional layer return scalar if only one output Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations. Reviewed By: kittipatv Differential Revision: D5386853 fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42	2017-07-12 11:34:31 -07:00
Jacqueline Xu	e89e71c595	Simplifying Random Fourier Features and layer test Summary: - Condensed operators in RFF layer - Adjusted RFF layer test; made test code more concise Reviewed By: chocjy Differential Revision: D5391436 fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1	2017-07-11 00:40:53 -07:00
Jacqueline Xu	6ea71155c1	Implementing Arc Cosine Layer Summary: - Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf \| Arc Cosine ]] layer - Developed buck unit test for Arc Cosine Reviewed By: chocjy Differential Revision: D5367604 fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0	2017-07-10 10:10:36 -07:00
Jacqueline Xu	25bd5dda27	Implementing random fourier features layer Summary: - Created the random fourier features layer - Generated a unit test to test the random fourier features layer is built correctly - Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf \| Random Features for Large-Scale Kernel Machines]] Reviewed By: chocjy Differential Revision: D5318105 fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176	2017-07-04 23:48:42 -07:00
Tao Wu	4be5337cca	add support for weight in batch_softmax_loss Summary: weighted batch_softmax_loss when weight exists in input_record Reviewed By: kittipatv Differential Revision: D5291646 fbshipit-source-id: f1bcd386ad1fc0e95e0a0315ec1c36531c792495	2017-06-21 10:32:15 -07:00
Jacqueline Xu	6150d9bef2	Building dropout as layer Summary: Dropout layer and unittest for DPer2 Reviewed By: chocjy Differential Revision: D5254866 fbshipit-source-id: 5eaea81808ddf8e0c7a7d76209ea44cda2ee28aa	2017-06-19 14:46:52 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Jiyan Yang	6aff754dbc	Add batch normalization layer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5077230 fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0	2017-05-26 16:46:52 -07:00
Kittipat Virochsiri	211eae127c	LastNWindowCollector Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator. Reviewed By: chocjy Differential Revision: D4981772 fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883	2017-05-04 17:32:09 -07:00
Kittipat Virochsiri	22d4eaeb9e	JoinContext Summary: Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context. Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN. Reviewed By: kennyhorror Differential Revision: D4964949 fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202	2017-05-02 17:32:26 -07:00
Chonglin Sun	e8e93066e7	add workflow for user complicated embedding Summary: Correctly propagate request_only tag to all layer. Reviewed By: kennyhorror Differential Revision: D4751496 fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316	2017-05-02 10:46:52 -07:00
Jiyan Yang	795dc1c326	Remove loss ops from eval net Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net. Differential Revision: D4934589 fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7	2017-04-26 12:46:25 -07:00
Jiyan Yang	ef2701a57e	MapToRange layer Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model. Reviewed By: kittipatv Differential Revision: D4902556 fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee	2017-04-25 16:03:58 -07:00
Kittipat Virochsiri	fd9185ab21	fix getting empty struct Summary: `not field` calls `__len__()`, causing the field to appear to be missing even when it's not Differential Revision: D4910587 fbshipit-source-id: bc2b2fadab96571ae43c4af97b30e50c084437af	2017-04-19 22:36:05 -07:00
Huazhong Ning	ad6b53e401	allow to specify output dtypes for functional layers Summary: Currently, the functional layer infers the output types and shapes by running the operator once. But in cases where special input data are needed to run the operator, the inferrence may fail. This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail. Reviewed By: kennyhorror Differential Revision: D4864003 fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc	2017-04-18 16:34:52 -07:00
Kittipat Virochsiri	0a726af42e	Coerce input of FunctionalLayer to record Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway. Differential Revision: D4895686 fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76	2017-04-17 19:26:06 -07:00
Kittipat Virochsiri	baf33161d4	GatherRecord layer Summary: Perform gather on the whole record. This will be used for negative random sampling. Reviewed By: kennyhorror Differential Revision: D4882430 fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289	2017-04-13 15:02:44 -07:00
Kittipat Virochsiri	5c32c82a6d	Add option to subtract log odd from sampled trained prediction. Summary: Useful for sampled softmax training Differential Revision: D4782673 fbshipit-source-id: 88195de60070a0bc16f5e06b9aad4dffd0484546	2017-04-03 17:50:58 -07:00
Xianjie Chen	9fc56793dd	fix trunk for push and small cleanup Summary: multiple places broken, blocking the push :( - fix the weighted training for ads and feeds - fix the publishing if no exporter model is selected - fix the feeds retrieval evaluation - added the default config for retrieval workflows. plan to use for flow test (in next diff) - clean up not used code - smaller hash size for faster canary test Reviewed By: chocjy Differential Revision: D4817829 fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807	2017-04-02 23:35:49 -07:00
Kittipat Virochsiri	3eb3507367	uniform_sampling layer Summary: This layer will be used to sample negative labels for sampled softmax. Differential Revision: D4773444 fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619	2017-03-29 14:36:12 -07:00
Andrey Malevich	7cc92b1260	Add eval net for layer_model_helper Summary: This diff is adding eval nets to layer model helper. It should be useful for the cases when train/eval nets need some extra input (usually some supervision) for train/eval. For example various sampled layers, etc. Differential Revision: D4769453 fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb	2017-03-29 04:03:40 -07:00
Kittipat Virochsiri	da36212259	SamplingTrain layer Summary: `SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema. When initiated in prediction context, the full output of the wrap layer is produced. This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context. I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad. Reviewed By: xianjiec Differential Revision: D4689887 fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8	2017-03-27 23:31:55 -07:00
Huazhong Ning	8168e8ac25	allows to specify output names for functional layers Summary: currently the output schema and blobs are names as "field_i" which is bad for debugging. This diff allows us to specify output names. Reviewed By: kennyhorror Differential Revision: D4744949 fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467	2017-03-23 13:18:58 -07:00

1 2

55 Commits