pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Dmytro Dzhulgakov	12094b5114	Add random shuffle through the data to the benchmark workflow Reviewed By: kdub0 Differential Revision: D5171727 fbshipit-source-id: 1d9182bb820224b479682fc0ca5014f909ba19d5	2017-06-16 13:22:46 -07:00
Aapo Kyrola	7ffd76db51	check operator schema before calling gradient creator Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages Reviewed By: Yangqing Differential Revision: D5256016 fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea	2017-06-15 13:04:58 -07:00
Alisson Gusatti Azzolini	d03ffb211c	Remove WORKER_INIT_CALLS Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore. Reviewed By: dzhulgakov Differential Revision: D5242908 fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed	2017-06-13 20:18:48 -07:00
Alexander Sidorov	eebda50b79	Operator python traceback Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call. Reviewed By: jamesr66a Differential Revision: D5226047 fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108	2017-06-13 18:50:02 -07:00
Alisson Gusatti Azzolini	d3ec6e8f55	Run python op builder at op creation time Summary: This allows to construct a python op by passing a pickled "builder function call" as an argument to the op. The builder function is called at PythonOp construction time and returns a function that will be called when the op is run. This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely. Reviewed By: dzhulgakov Differential Revision: D5080833 fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce	2017-06-13 16:29:22 -07:00
Thomas Dudziak	b877d4b5f8	Misc fixes for Python 3 Summary: As title Differential Revision: D5216942 fbshipit-source-id: def5563f1b259efefab3a829d8a78d8d3297ffc7	2017-06-13 12:18:43 -07:00
Alexander Sidorov	7f1385e70c	Improve gradient accumulation of the framework: 1.5x - 2x Summary: We waste extra memory by creating two autosplit gradient blobs and then accumulating it into them main one. Sometimesk, when Sum / Sub ops are involved, we can avoid wasting extra memory at all. Ideally we would not waste any memory and make ops add to the same blob rather then calculating separate results and then mering them. But it would require a substantial change to the frameworks and rewriting a lot of operators. Reviewed By: dzhulgakov Differential Revision: D5157667 fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b	2017-06-11 22:02:30 -07:00
Dmytro Dzhulgakov	638fe804dc	Implement recover_input_schema_by_prefix Summary: It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef. For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs. Differential Revision: D5170473 fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663	2017-06-11 15:37:12 -07:00
Alexander Sidorov	df72826ead	Static RNN Summary: Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version. Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff Reviewed By: akyrola Differential Revision: D5200943 fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc	2017-06-08 17:48:48 -07:00
Alexander Sidorov	264f75fdd0	ZeroGradient op Summary: when building a multi layer static RNN the last timestep of the first layer (and other layers except the last one) doesn't get a gradient for the cell state as normally user uses results only from the last layer and cell state doesn't go up either. ZeroGradient provides a general solution for injecting 0 gradient blobs. It is in some way similar to StopGradient operator which is also specialcased Reviewed By: bwasti Differential Revision: D5198375 fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de	2017-06-08 16:02:38 -07:00
Yiming Wu	4fefff0bbb	Auto injecting device copy for single net and several nets Summary: This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves. Ideally, this feature will happen like this: //construct your nets first core.InjectDeviceCopyAmongNets([train_init, train_net, ...]) My ideas are written in comments. I will update them here as well later. Reviewed By: dzhulgakov Differential Revision: D5134103 fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5	2017-06-07 20:03:18 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Yiming Wu	8cd208ad6f	Infer input and output device from OperatorDef through OperatorSchema Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution. Reviewed By: akyrola, dzhulgakov Differential Revision: D5161065 fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135	2017-06-05 23:47:33 -07:00
Ross Girshick	8e99824ce7	Allow subsets of gradient outputs / inputs in Python ops Summary: I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op. I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed. Reviewed By: dzhulgakov Differential Revision: D4897004 fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd	2017-06-05 12:52:01 -07:00
Yiming Wu	8871ef029b	quick fix future issue with brew/core/schema/workspace/scope/utils.py Summary: fixing missing future package issue. Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import Reviewed By: Yangqing Differential Revision: D5183547 fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82	2017-06-05 12:01:48 -07:00
Alexander Sidorov	846240a340	Caffe2 gradient generator bug fix Summary: Bug repro is in a test. Generally speaking accumulation was not happening if len(ys) >= 2 (list of blobs we compute gradients from) and for some blob in the net it was both in ys list and also got a gradient propagated from another element in ys. Reviewed By: akyrola Differential Revision: D5121695 fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130	2017-05-30 18:47:08 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Aapo Kyrola	44257ea5ed	automatically infer device scope for param Summary: hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net. Added a test as well. Reviewed By: salexspb Differential Revision: D5133652 fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f	2017-05-30 12:02:19 -07:00
Thomas Dudziak	3ccbf23132	String-related fixes for Python 3 Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings. Reviewed By: salexspb Differential Revision: D4893083 fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57	2017-05-26 16:04:32 -07:00
Alisson Gusatti Azzolini	75bc9f5e77	Relax requirement on token uniqueness Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff. Reviewed By: kittipatv Differential Revision: D5034132 fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b	2017-05-09 19:36:00 -07:00
Alisson Gusatti Azzolini	bd8ed6641c	Stabilize PythonOp token name Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile. Reviewed By: dzhulgakov Differential Revision: D5016847 fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5	2017-05-09 11:19:44 -07:00
Aapo Kyrola	711ea1d4ac	fix enternalinputs handling in AppendNet v2 Summary: External inputs must be computed before updating the _ops_output structure, otherwise if the net to be appended outputs the external input, it is not added correctly Differential Revision: D5013496 fbshipit-source-id: 6a83d0a6f1c63ef8ae7bec4d862c0ac2a690d47b	2017-05-05 21:50:57 -07:00
Eider Moore	0c6099ce25	Add __dir__ so autocomplete in iPython works. Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently. In particular, it is hard to explore the available methods in iPython without tab completion. Reviewed By: dzhulgakov Differential Revision: D5006545 fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4	2017-05-05 11:32:06 -07:00
Kittipat Virochsiri	22d4eaeb9e	JoinContext Summary: Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context. Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN. Reviewed By: kennyhorror Differential Revision: D4964949 fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202	2017-05-02 17:32:26 -07:00
Kittipat Virochsiri	e8e36945cf	make debug message more explicit & verbose Summary: I ran into this earlier and the debug messages were not helpful enuogh Reviewed By: kennyhorror Differential Revision: D4985754 fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28	2017-05-02 12:39:14 -07:00
Krishna Vudata	1f3c7f8080	Handle net.external_inputs correctly in AppendNet Summary: When appending net A to net B, an external input of net A should not be added as an external input of net B if net B is outputting that blob. Reviewed By: dzhulgakov Differential Revision: D4975921 fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4	2017-05-02 11:20:26 -07:00
intel	b3b66e3d00	MKL related files with review comments incorporated Summary: This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available. All MKL related files are added here after incorporating review comments, major changes include 1. usage of Clang-format(Linter) with --style = Google 2. usage of macros for checking input and filter dimension in the mkl operators 3. merged Max and Average pooling functions 4. created a new folder for mkl related python scripts in Python folder and moved them there 5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing Closes https://github.com/caffe2/caffe2/pull/270 Differential Revision: D4905219 Pulled By: Yangqing fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7	2017-04-25 00:31:29 -07:00
Ahmed Taei	7440cd5ef4	Add python_func_type to PythonOp Summary: This is needed to have a stateful PythonOp (such as the PyTorch in the following diff) where computing f will produce a state (not tensors) thats consumed by grad_f. python_func_type is a type that constructed as python_func_type(f) and provides forward, backward methods (will be delegated to f, &f_grad). We are constructing this object in at Op registration time to have it as thread local. Differential Revision: D4900963 fbshipit-source-id: 00a6a55fa372e2244048921914e22e710d11f7ce	2017-04-24 15:52:26 -07:00
Aapo Kyrola	570c6bb9b7	Fix backward pass computation when an input is used in a Fill-op input for shape Summary: Fix issue that amyzhang encountered. She was using ConstantFill to create a blob of same size as an another blob. This caused the gradient op computation flow to interrupt through the ConstantFil since the gradient for the input blob was set to None (although it had another gradient already set). The correct solution is to avoid overwriting gradient assignments with None, if they already have a gradient. UNLESS that blob is output of the same op, as with StopGradient op. (Note that Amy's problem was fixed by using instead a fixed shape ConstantFill and Add with broadcast=1, which is better solution anyway). Not sure if I explained this well, but see the new unit tests. Before this change, the testAddAndDynamicConstant failed but the testAddAndStaticConstant succeeded. Reviewed By: dzhulgakov Differential Revision: D4861176 fbshipit-source-id: 3b53621bfaba2e36786a5e4664145038995f6616	2017-04-11 19:32:22 -07:00
Aapo Kyrola	8da2d75ec8	Caffe2/Recurrent] recurrent.py API to cuDNN LSTM Summary: Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM. * Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent. * Removed RecurrentInit as not needed * recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM * recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases * recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result. recurrent_test.py tests for the equivalency Reviewed By: salexspb Differential Revision: D4654988 fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0	2017-04-05 14:20:23 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Aapo Kyrola	783e40e806	Fix lengths-remapping again + better errors Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that. Differential Revision: D4702123 fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4	2017-03-14 11:04:45 -07:00
Andrey Malevich	3e54601bab	New approach to metrics. Summary: This diff is modifying the way we're specifying metrics from doing reporter, that should know all the blobs which is should access in advance, to reporter that is connected through schema. This diff is also reporting any arbitrary number of learning curves to Flow and provides really flexible way to specify all the metrics we care about. TODO: Modify model helper to allow providing intermediate results for reporting TODO: Add evaluation net (instead of prediction net). TODO: Move all other places in DPER 2.0 to use that abstractions instead. TODO: Get rid of LogScoreEstimator in favor of metric that is going to be really suiting our needs. Reviewed By: azzolini, dzhulgakov, kittipatv Differential Revision: D4577548 fbshipit-source-id: 3515bd41e0f92263ff90ce2f7207abf65d01b1f7	2017-03-06 14:48:16 -08:00
Huazhong Ning	f747bbec2e	move the dper 1.0 utils to c2 or fb utils Summary: so that the utils can be used by a wider range of audience. Reviewed By: xianjiec Differential Revision: D4637462 fbshipit-source-id: f0695f430902aef26360efa511069b3755eaf52a	2017-03-06 14:31:45 -08:00
Artem Volkhin	8c4310ac16	minor fix for _add_net_to_dict Summary: fix a check if the net is net_dict Reviewed By: kennyhorror Differential Revision: D4647493 fbshipit-source-id: e0a62fc5847c99c85857c5635b4e39d59c66d5ce	2017-03-02 23:31:27 -08:00
Qichao Que	2f68632a32	Add SparseNN workflow for feed. Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper. Reviewed By: xianjiec Differential Revision: D4585796 fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb	2017-03-01 11:02:38 -08:00
Zachary Mirman	1c92e85dae	Added editDistance helper to caffe2 operators Summary: Added editDistance helper to caffe2 operators Differential Revision: D4622152 fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c	2017-02-28 13:31:56 -08:00
Xianjie Chen	b257fd8e83	Other places that may need NameScope Summary: For code in layer model helper, layers. It's intentionally to not have NameScope by default. This looks another place that may need default NameScope. https://fburl.com/wdwtxp0m Reviewed By: kennyhorror Differential Revision: D4606971 fbshipit-source-id: b560bf59d3242e3f9443cd5aeda5c7e2e4e89079	2017-02-23 21:16:35 -08:00
Yangqing Jia	47b65b6d8d	Add a create your own dataset tutorial Summary: bwasti - will follow up via email. Closes https://github.com/caffe2/caffe2/pull/166 Differential Revision: D4596858 Pulled By: Yangqing fbshipit-source-id: 6d088ccf1604e0dc9b94cbf0a75b51587e734d95	2017-02-22 03:31:47 -08:00
Alisson Gusatti Azzolini	8fa156d082	Improve "reporter net" design Summary: Previously we had several limitations for a reporter net: - needed to be a net, not an execution step - only one allowed per execution step, with a single interval Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts. Reviewed By: dzhulgakov Differential Revision: D4583686 fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d	2017-02-21 20:17:40 -08:00
Xianjie Chen	d0621a2449	NextScopedBlob with well-defined behavior and respect namescope Summary: Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope. The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference. This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models. Reviewed By: kennyhorror Differential Revision: D4555423 fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187	2017-02-16 17:16:36 -08:00
Alisson Gusatti Azzolini	039ac56a68	Better names for nets, steps and tasks Summary: - NetBuilder now honors its name - When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix - When a NetBuilder is created in the context of a Task, it takes the Tasks's name. - pipe() now tries to find a good name based on its processor's, output or input queue's name. - RPC tries to find a name from its handler's name. - Better names in DataStream - net_printer prints the name of Tasks and Steps - net_printer optionally factors out common prefixes form blob names. Differential Revision: D4527578 fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217	2017-02-09 16:33:54 -08:00
Yangqing Jia	f2b3f0ab5c	remove decode() Summary: This should not be needed any more since we use pybind. It will help python3 migration. Reviewed By: salexspb Differential Revision: D4535490 fbshipit-source-id: a47615f73b5c35b940d21bb2d5d55060fa0850be	2017-02-09 10:08:13 -08:00
Alisson Gusatti Azzolini	1d3834eeb2	Nodes to support resource requirements and outputs Summary: See distributed.py for example of usage Reviewed By: xianjiec Differential Revision: D4467723 fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd	2017-01-30 11:29:25 -08:00
Ou Jin	ed04a20289	distributed reader for evaluation Summary: Using multiple readers for model evaluation. Since it is built by new framework, only NativeLoader is supported. With 5 readers, the evaluation speed is 124k. The speed for single evaluator is 32k. There is still room for improvement since the evaluator machine is under-utilized. (Hive is the bottleneck. Adding more loading threads help to improve the speed to 240k. More readers can improve it further.) Reviewed By: azzolini Differential Revision: D4469393 fbshipit-source-id: b55af5f798faca4c150b2c0663fe5db0f154cb70	2017-01-27 10:44:24 -08:00
Dmytro Dzhulgakov	aed53dd7cf	Pass cmd flags of GlobalInit down to workers in Flow Summary: It's a similar trick to dyndeps. The idea is that global state is better to be just replicated to gang workers as otherwise it causes a lot of confusion. In particular it's useful if one wants to enable detailed logging (--v) For other operators user still needs to call GlobalInit explicitly. We should consider doing it for all Flow operators, but I'll leave it for future considerations. Reviewed By: kennyhorror Differential Revision: D4460686 fbshipit-source-id: 5836737dd3195f9ad12589fd899a3ff63f173e05	2017-01-25 11:14:51 -08:00
Ross Girshick	e0c90de6e6	Speedup get_op_ids_in_path Summary: Perf bug report: https://www.facebook.com/groups/1405155842844877/permalink/1617904561570003/ Diagnosis: I've done some digging into this and here's what I've found: (1) In this use case, the call is disallowed_op_ids = get_op_ids_in_path(ssa, blob_versions, [], inputs)) where inputs = ['res4_22_sum'] is the last blob produced by the res4 stage of a ResNet101 model. (2) get_op_ids_in_path has exponential running time in the number of blocks in the res4 stage of ResNet. This is based on empirical running times. This call should complete in 4.5 days on my devgpu. (3) I haven't familiarized myself enough with the IR and SSA code in core.py to understand the algorithmic fix yet, but surely there's a more efficient algorithm to compute the same thing. Reviewed By: Yangqing Differential Revision: D4446278 fbshipit-source-id: 8bd147f92d62b865dc355d5802a53e92d64b6e21	2017-01-23 09:44:26 -08:00
Andrey Malevich	9f0a7935f6	Replace one more place from _net.external_input to _external_input_map Summary: #accept2ship Reviewed By: dzhulgakov Differential Revision: D4435301 fbshipit-source-id: 6b62492c190325e82bc14d5397852106d07d5235	2017-01-19 12:29:30 -08:00
Xianjie Chen	4b3bd06a7f	sparse nn converges better by dedupping sparse gradient by mean Summary: this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter. experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first. Differential Revision: D4367283 fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607	2016-12-27 22:59:29 -08:00

1 2

69 Commits