pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yan Shang	41bb662d96	add dense regularization Reviewed By: xianjiec Differential Revision: D5617571 fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8	2018-01-08 13:03:17 -08:00
Xiaolong Wang	7315a19bc9	add maybe_add_global_constant Summary: In layer model helper, add a method `maybe_add_global_constant` to ensure that when two global constants are added with the same name, we check if they are actually the same (by initializer) and only add it once. Reviewed By: kennyhorror Differential Revision: D6537532 fbshipit-source-id: 37aa3860a2e40d81161ccdea0c50a316248be2e2	2017-12-18 22:14:00 -08:00
Liang Xiong	fc0c8c2316	minor refactoring in dper Summary: small changes as I was reading through the dper code base. all of them are nits, but somewhat helped me understanding things. Reviewed By: xianjiec Differential Revision: D6389380 fbshipit-source-id: 3412052e4fcba199c6ffc84c6f7ae11bf8ff6ee9	2017-11-21 18:12:49 -08:00
Yan Shang	24e83acbb9	Enable sampling in evaluation Reviewed By: chocjy Differential Revision: D6119768 fbshipit-source-id: c8447326008392df70ab10b04f84223cf6d882b1	2017-11-16 14:03:51 -08:00
Jiyan Yang	ee3baa2ed4	Add shape checks and print more info in parameter sharing Summary: As titled. Reviewed By: kittipatv Differential Revision: D6145747 fbshipit-source-id: 39a212bb6bebbbf3164cade2f95db22ddb2d2c87	2017-10-27 01:22:06 -07:00
Dmytro Dzhulgakov	2972a6ca02	Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger" Summary: This reverts commit 95c634872ac02be721257169e38c8fead04cd66b bypass-lint Differential Revision: D6026557 fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776	2017-10-12 20:21:52 -07:00
Luke Yeager	75bece6ede	Fix "No handlers could be found for logger" Summary: Closes https://github.com/caffe2/caffe2/pull/1316 Differential Revision: D6026557 Pulled By: Yangqing fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b	2017-10-10 22:32:13 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Aapo Kyrola	2108d1c250	Add unit-tests for fb-specific models Reviewed By: azzolini Differential Revision: D5895367 fbshipit-source-id: e7a7cdb272cdcdd7495efe9a6203750d1e6d6c48	2017-09-26 21:17:51 -07:00
Kittipat Virochsiri	1b059f4c98	Add option to ignore parameter initialization Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization. Reviewed By: chocjy Differential Revision: D5870362 fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a	2017-09-20 12:03:22 -07:00
Kittipat Virochsiri	7d2b2cae19	Remove OFFLINE_TRAINING from global constant Summary: This is not a very generic constant Reviewed By: volkhin Differential Revision: D5870378 fbshipit-source-id: 59509bb48cecb52ba4a3f26b290855374547fe7e	2017-09-20 12:03:21 -07:00
Jiyan Yang	a8695178aa	Adding parameter sharing API to Dper2 Summary: To achive this, I modified the blob name scheme defined in a layer. Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc within the same scope). Now I change it to scope/fc/w and scope/fc_auto_0/w. That is, we rely on the uniqueness of the scoped layer name to define names for blobs. I also overwrote the create_param method in LayerModelHelper to let it use the resolved name for blobs given the sharingparameter context. There are some details such as making the initializer more structured that I need to finalize. Reviewed By: kennyhorror Differential Revision: D5435132 fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455	2017-08-03 00:33:18 -07:00
Jacqueline Xu	3cc03568da	Fixing error message for layer model helper Summary: - Minor fix for error message in layer model helper file Reviewed By: chocjy Differential Revision: D5440768 fbshipit-source-id: df47bfe68a0caa750f0d3c8def28a5585e465ee0	2017-07-18 09:52:45 -07:00
Honghao Wei	b68adec7bb	adding model loss logic Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'. Reviewed By: xianjiec Differential Revision: D5399056 fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b	2017-07-14 16:25:23 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Yiming Wu	1fce3eac4e	single trainer hybrid device Summary: First try of single trainer hybrid device training for sparsenn Comparison results with CPU training: https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293 Reviewed By: dzhulgakov Differential Revision: D5205723 fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92	2017-06-27 22:06:30 -07:00
Bokai Cao	d9087edb07	add rekey in feature_processor Differential Revision: D5270972 fbshipit-source-id: 8805c0e947f4752d2c575e2a7b8986cd804601dc	2017-06-20 23:19:09 -07:00
Bokai Cao	d2b1cb22a4	rekey layer Differential Revision: D5210095 fbshipit-source-id: dc66a10d95842e0f10cb53a5afb7ddcc3fcac0de	2017-06-19 18:47:28 -07:00
haracejacob	2ec294a8bb	Fix a few typos and grammars in comment Summary: Fix a few typos and grammars in comment by using language-check, python library spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py here is the text file which indicates what things should be fixed : https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2 Closes https://github.com/caffe2/caffe2/pull/719 Differential Revision: D5165118 Pulled By: aaronmarkham fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b	2017-06-14 18:22:39 -07:00
Huazhong Ning	660dd58022	fix for realtime training. Reviewed By: kennyhorror Differential Revision: D5068298 fbshipit-source-id: 0dc3580c9c8123368a3625fb654c6eaf1dc4a950	2017-05-26 23:49:40 -07:00
Yang Yang	769e668faf	ttsn model fails to set optimizer for FC layer Summary: the FC ModelLayer needs an optimizer, also seems the catch-all that sets a default for missing optimizers had a bug Reviewed By: xianjiec Differential Revision: D5048302 fbshipit-source-id: cbbf641fb9ee4f4f89c5dbb132f7837ecdbe37a5	2017-05-16 11:26:02 -07:00
Huazhong Ning	942f53b5a6	gradient impact of task layers on shared is configurable Reviewed By: chocjy Differential Revision: D4943948 fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3	2017-05-11 20:34:04 -07:00
Alisson Gusatti Azzolini	20d8de8d51	Parameter cost estimation job Summary: Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server. Things I needed to modify: - A few changes to make ModelLayerHelper picklable - Add support for stopping a distributed job after a number of stats reporting steps. - Refactored run_dist_job to support collocating the reader with the trainer even when PS are present. - Option to disable dense updates (when num_dense_servers=0). Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff. This is WIP because the other workflows need to be migrated as well. I can break this down into smaller diffs if reviewers would prefer it. Reviewed By: kennyhorror Differential Revision: D4974752 fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672	2017-05-09 13:02:24 -07:00
Kittipat Virochsiri	7d6d67119f	Allow LayerModelHelper to keep input blobs from schema Summary: In certain situation, like in D4907916 where we insert additional step in the middle of a model, it's neccessary to keep the blob names constant across model helper so that it doesn't break communication schema. Reviewed By: kennyhorror Differential Revision: D4981527 fbshipit-source-id: 6b8d6d240279dd48f801cfacbaa1d320ba54d694	2017-05-01 21:31:36 -07:00
Yiming Wu	bef6e45f8b	rename ModelHelperBase Summary: rename ModelHelperBase to Model. This is the result of running: find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} + We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py Reviewed By: salexspb Differential Revision: D4928337 fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3	2017-04-24 15:52:26 -07:00
Huazhong Ning	ad6b53e401	allow to specify output dtypes for functional layers Summary: Currently, the functional layer infers the output types and shapes by running the operator once. But in cases where special input data are needed to run the operator, the inferrence may fail. This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail. Reviewed By: kennyhorror Differential Revision: D4864003 fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc	2017-04-18 16:34:52 -07:00
Jiyan Yang	a7217e6626	Remove unused optimizers Summary: As desc. Reviewed By: xianjiec Differential Revision: D4840482 fbshipit-source-id: bf820154475508ce581d16a45bcd93d026b60f30	2017-04-05 21:18:29 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Andrey Malevich	7cc92b1260	Add eval net for layer_model_helper Summary: This diff is adding eval nets to layer model helper. It should be useful for the cases when train/eval nets need some extra input (usually some supervision) for train/eval. For example various sampled layers, etc. Differential Revision: D4769453 fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb	2017-03-29 04:03:40 -07:00
Kittipat Virochsiri	6163676ebe	Skip optimizer when param doesn't have gradient and optimizer is not set Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once. Differential Revision: D4773212 fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3	2017-03-24 22:18:34 -07:00
Xiaolong Wang	8ce34d6c87	Add Calibration Summary: Add calibration to sparse_nn Differential Revision: D4735564 fbshipit-source-id: 6baa637cbffcbbd50134a256d622ef8c962fca3b	2017-03-24 14:32:23 -07:00
Andrey Malevich	b599910f3a	Use new metric intefaces in trainer workflows. Summary: This diff is migrating existing DPER workflows to use new metric abstractions in training. Reviewed By: xianjiec Differential Revision: D4656576 fbshipit-source-id: 1b3b16b390fc0757480e41df1c4214c11cd76e8a	2017-03-07 12:46:52 -08:00
Qichao Que	2f68632a32	Add SparseNN workflow for feed. Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper. Reviewed By: xianjiec Differential Revision: D4585796 fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb	2017-03-01 11:02:38 -08:00
Andrey Malevich	a3726759c6	Add a way do describe layers in a more AdHoc manner. Summary: This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around. The basic idea of the diff: 1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters). 2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed. Reviewed By: xianjiec Differential Revision: D4408771 fbshipit-source-id: aced7487571940d726424269970df0eb62670c39	2017-02-27 23:30:39 -08:00
Xianjie Chen	d0621a2449	NextScopedBlob with well-defined behavior and respect namescope Summary: Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope. The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference. This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models. Reviewed By: kennyhorror Differential Revision: D4555423 fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187	2017-02-16 17:16:36 -08:00
Xianjie Chen	fb7c9108d9	get parameter blobs of a model Summary: to verify that a model only used a subset of the parameters of another model (e.g., the model doing training). Differential Revision: D4557787 fbshipit-source-id: bd8ac96f5e78e05f6f56086db6e6ddcda36c1d37	2017-02-15 16:00:44 -08:00
Andrey Malevich	ec51f887bf	Create only one instance of SigridTransform in DPerExample. Summary: DPer example have been creating multiple copies of the transform config in net defition till this moment, that resulted in the fact that I've hit the limit of ProtoBuf (64MB) for a certain Task requests (especially visible because of the ValidationPipeline that I was adding). After this diff we're going to store SigridTransforms in one instance per machine for training (or 1 instance per reading). Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well). TODO: Do similar logic for NNPreProc as well (it's also pretty large). Reviewed By: dzhulgakov Differential Revision: D4441441 fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047	2017-01-22 19:29:16 -08:00
Xianjie Chen	4b3bd06a7f	sparse nn converges better by dedupping sparse gradient by mean Summary: this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter. experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first. Differential Revision: D4367283 fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607	2016-12-27 22:59:29 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00

39 Commits