Commit Graph

7 Commits

Author SHA1 Message Date
Dmytro Dzhulgakov
335b73221c Unify train_local and train_with_distributed_readers
Summary:
Outline of changes:

- add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions)
- because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case
- after that it's easy to unify code even more
- there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks)

Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification.

I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing.

Reviewed By: xianjiec

Differential Revision: D4526079

fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72
2017-02-09 20:46:35 -08:00
Alisson Gusatti Azzolini
039ac56a68 Better names for nets, steps and tasks
Summary:
- NetBuilder now honors its name
- When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix
- When a NetBuilder is created in the context of a Task, it takes the Tasks's name.
- pipe() now tries to find a good name based on its processor's, output or input queue's name.
- RPC tries to find a name from its handler's name.
- Better names in DataStream
- net_printer prints the name of Tasks and Steps
- net_printer optionally factors out common prefixes form blob names.

Differential Revision: D4527578

fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217
2017-02-09 16:33:54 -08:00
Alisson Gusatti Azzolini
33c0e5619b Add Task.REPORT_NET attribute
Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon)

Reviewed By: dzhulgakov

Differential Revision: D4497771

fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857
2017-02-03 18:44:50 -08:00
Alisson Gusatti Azzolini
1d3834eeb2 Nodes to support resource requirements and outputs
Summary: See distributed.py for example of usage

Reviewed By: xianjiec

Differential Revision: D4467723

fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
2017-01-30 11:29:25 -08:00
Alisson Gusatti Azzolini
6618d7462d Improvements+fixes for NetBuilder
Summary: Title.

Reviewed By: dzhulgakov

Differential Revision: D4358227

fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8
2017-01-03 16:59:24 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00