Commit Graph

68 Commits

Author SHA1 Message Date
Huazhong Ning
942f53b5a6 gradient impact of task layers on shared is configurable
Reviewed By: chocjy

Differential Revision: D4943948

fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3
2017-05-11 20:34:04 -07:00
Alisson Gusatti Azzolini
20d8de8d51 Parameter cost estimation job
Summary:
Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server.

Things I needed to modify:
- A few changes to make ModelLayerHelper picklable
- Add support for stopping a distributed job after a number of stats reporting steps.
- Refactored run_dist_job to support collocating the reader with the trainer even when PS are present.
- Option to disable dense updates (when num_dense_servers=0).

Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff.

This is WIP because the other workflows need to be migrated as well.

I can break this down into smaller diffs if reviewers would prefer it.

Reviewed By: kennyhorror

Differential Revision: D4974752

fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672
2017-05-09 13:02:24 -07:00
Kittipat Virochsiri
7d6d67119f Allow LayerModelHelper to keep input blobs from schema
Summary: In certain situation, like in D4907916 where we insert additional step in the middle of a model, it's neccessary to keep the blob names constant across model helper so that it doesn't break communication schema.

Reviewed By: kennyhorror

Differential Revision: D4981527

fbshipit-source-id: 6b8d6d240279dd48f801cfacbaa1d320ba54d694
2017-05-01 21:31:36 -07:00
Yiming Wu
bef6e45f8b rename ModelHelperBase
Summary:
rename ModelHelperBase to Model.

This is the result of running:

  find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} +

We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py

Reviewed By: salexspb

Differential Revision: D4928337

fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3
2017-04-24 15:52:26 -07:00
Huazhong Ning
ad6b53e401 allow to specify output dtypes for functional layers
Summary:
Currently, the functional layer infers the output types and shapes by running the operator once.
But in cases where special input data are needed to run the operator, the inferrence may fail.
This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail.

Reviewed By: kennyhorror

Differential Revision: D4864003

fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc
2017-04-18 16:34:52 -07:00
Jiyan Yang
a7217e6626 Remove unused optimizers
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D4840482

fbshipit-source-id: bf820154475508ce581d16a45bcd93d026b60f30
2017-04-05 21:18:29 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Andrey Malevich
7cc92b1260 Add eval net for layer_model_helper
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.

Differential Revision: D4769453

fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
2017-03-29 04:03:40 -07:00
Kittipat Virochsiri
6163676ebe Skip optimizer when param doesn't have gradient and optimizer is not set
Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once.

Differential Revision: D4773212

fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3
2017-03-24 22:18:34 -07:00
Xiaolong Wang
8ce34d6c87 Add Calibration
Summary: Add calibration to sparse_nn

Differential Revision: D4735564

fbshipit-source-id: 6baa637cbffcbbd50134a256d622ef8c962fca3b
2017-03-24 14:32:23 -07:00
Andrey Malevich
b599910f3a Use new metric intefaces in trainer workflows.
Summary: This diff is migrating existing DPER workflows to use new metric abstractions in training.

Reviewed By: xianjiec

Differential Revision: D4656576

fbshipit-source-id: 1b3b16b390fc0757480e41df1c4214c11cd76e8a
2017-03-07 12:46:52 -08:00
Qichao Que
2f68632a32 Add SparseNN workflow for feed.
Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper.

Reviewed By: xianjiec

Differential Revision: D4585796

fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb
2017-03-01 11:02:38 -08:00
Andrey Malevich
a3726759c6 Add a way do describe layers in a more AdHoc manner.
Summary:
This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around.

The basic idea of the diff:
1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters).
2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed.

Reviewed By: xianjiec

Differential Revision: D4408771

fbshipit-source-id: aced7487571940d726424269970df0eb62670c39
2017-02-27 23:30:39 -08:00
Xianjie Chen
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
Xianjie Chen
fb7c9108d9 get parameter blobs of a model
Summary: to verify that a model only used a subset of the parameters of another model (e.g., the model doing training).

Differential Revision: D4557787

fbshipit-source-id: bd8ac96f5e78e05f6f56086db6e6ddcda36c1d37
2017-02-15 16:00:44 -08:00
Andrey Malevich
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
Xianjie Chen
4b3bd06a7f sparse nn converges better by dedupping sparse gradient by mean
Summary:
this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter.

experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first.

Differential Revision: D4367283

fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607
2016-12-27 22:59:29 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00