Commit Graph

11 Commits

Author SHA1 Message Date
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Alisson Gusatti Azzolini
59f0454621 Gather perf counters for distributed jobs
Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once.

Reviewed By: dzhulgakov

Differential Revision: D4555116

fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a
2017-02-21 22:06:25 -08:00
Alisson Gusatti Azzolini
6ff05fd49d Fix issues pickling jobs
Summary:
We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session.
This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling.

Reviewed By: dzhulgakov

Differential Revision: D4554799

fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984
2017-02-21 20:47:27 -08:00
Alisson Gusatti Azzolini
8fa156d082 Improve "reporter net" design
Summary:
Previously we had several limitations for a reporter net:
 - needed to be a net, not an execution step
 - only one allowed per execution step, with a single interval

Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts.

Reviewed By: dzhulgakov

Differential Revision: D4583686

fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d
2017-02-21 20:17:40 -08:00
Dmytro Dzhulgakov
335b73221c Unify train_local and train_with_distributed_readers
Summary:
Outline of changes:

- add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions)
- because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case
- after that it's easy to unify code even more
- there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks)

Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification.

I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing.

Reviewed By: xianjiec

Differential Revision: D4526079

fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72
2017-02-09 20:46:35 -08:00
Alisson Gusatti Azzolini
039ac56a68 Better names for nets, steps and tasks
Summary:
- NetBuilder now honors its name
- When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix
- When a NetBuilder is created in the context of a Task, it takes the Tasks's name.
- pipe() now tries to find a good name based on its processor's, output or input queue's name.
- RPC tries to find a name from its handler's name.
- Better names in DataStream
- net_printer prints the name of Tasks and Steps
- net_printer optionally factors out common prefixes form blob names.

Differential Revision: D4527578

fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217
2017-02-09 16:33:54 -08:00
Alisson Gusatti Azzolini
33c0e5619b Add Task.REPORT_NET attribute
Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon)

Reviewed By: dzhulgakov

Differential Revision: D4497771

fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857
2017-02-03 18:44:50 -08:00
Alisson Gusatti Azzolini
1d3834eeb2 Nodes to support resource requirements and outputs
Summary: See distributed.py for example of usage

Reviewed By: xianjiec

Differential Revision: D4467723

fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
2017-01-30 11:29:25 -08:00
Alisson Gusatti Azzolini
6618d7462d Improvements+fixes for NetBuilder
Summary: Title.

Reviewed By: dzhulgakov

Differential Revision: D4358227

fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8
2017-01-03 16:59:24 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00