Commit Graph

19 Commits

Author SHA1 Message Date
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Lei Chen
14950a9082 Support session in distributed realtime trainer
Summary:
Convert from PlanDef ProtoBuf into python Plan object by recursively creating
Nets and ExecutionSteps.

Also support running Plan object directly in Session.

Reviewed By: azzolini

Differential Revision: D5608393

fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c
2017-08-16 10:28:55 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Alisson Gusatti Azzolini
7d482742fd Allow tasks/execution_steps to be cloned at runtime
Summary:
Advantages of cloning the tasks/execution_steps at runtime:
- Less complexity on the python side: no need to clone nets and add prefixes to blob names
- Faster start-up: we had cases of complex plans that took up to 30min to be created.
- Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs.
- Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime.

Reviewed By: dzhulgakov

Differential Revision: D5100730

fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc
2017-06-20 22:32:07 -07:00
Xiaolong Wang
b133c214ce fix potential bug in task.py
Summary: as titled

Differential Revision: D5225166

fbshipit-source-id: 9247fe44922c097752c6996ee9192ec72b7e7d88
2017-06-11 10:40:47 -07:00
Xiaolong Wang
827a0ac2fe Fix comment mistakes in task.py
Summary: as titled

Reviewed By: kennyhorror

Differential Revision: D5225154

fbshipit-source-id: 99a9547e15e0d5a4c81b6339ce75406160a7fc07
2017-06-11 10:17:07 -07:00
Thomas Dudziak
47e921ba49 Remove map() and filter() in favor of comprehensions
Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand

Reviewed By: akyrola

Differential Revision: D5142049

fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc
2017-05-30 15:32:58 -07:00
Alisson Gusatti Azzolini
310f505da7 Remove application-specific comment.
Summary: This comment is not relevant for open-source.

Differential Revision: D5070835

fbshipit-source-id: 8e2dadae85566e7f6684d42f921daf7d345dc065
2017-05-16 12:17:03 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Alisson Gusatti Azzolini
59f0454621 Gather perf counters for distributed jobs
Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once.

Reviewed By: dzhulgakov

Differential Revision: D4555116

fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a
2017-02-21 22:06:25 -08:00
Alisson Gusatti Azzolini
6ff05fd49d Fix issues pickling jobs
Summary:
We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session.
This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling.

Reviewed By: dzhulgakov

Differential Revision: D4554799

fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984
2017-02-21 20:47:27 -08:00
Alisson Gusatti Azzolini
8fa156d082 Improve "reporter net" design
Summary:
Previously we had several limitations for a reporter net:
 - needed to be a net, not an execution step
 - only one allowed per execution step, with a single interval

Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts.

Reviewed By: dzhulgakov

Differential Revision: D4583686

fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d
2017-02-21 20:17:40 -08:00
Dmytro Dzhulgakov
335b73221c Unify train_local and train_with_distributed_readers
Summary:
Outline of changes:

- add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions)
- because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case
- after that it's easy to unify code even more
- there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks)

Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification.

I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing.

Reviewed By: xianjiec

Differential Revision: D4526079

fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72
2017-02-09 20:46:35 -08:00
Alisson Gusatti Azzolini
039ac56a68 Better names for nets, steps and tasks
Summary:
- NetBuilder now honors its name
- When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix
- When a NetBuilder is created in the context of a Task, it takes the Tasks's name.
- pipe() now tries to find a good name based on its processor's, output or input queue's name.
- RPC tries to find a name from its handler's name.
- Better names in DataStream
- net_printer prints the name of Tasks and Steps
- net_printer optionally factors out common prefixes form blob names.

Differential Revision: D4527578

fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217
2017-02-09 16:33:54 -08:00
Alisson Gusatti Azzolini
33c0e5619b Add Task.REPORT_NET attribute
Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon)

Reviewed By: dzhulgakov

Differential Revision: D4497771

fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857
2017-02-03 18:44:50 -08:00
Alisson Gusatti Azzolini
1d3834eeb2 Nodes to support resource requirements and outputs
Summary: See distributed.py for example of usage

Reviewed By: xianjiec

Differential Revision: D4467723

fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
2017-01-30 11:29:25 -08:00
Alisson Gusatti Azzolini
6618d7462d Improvements+fixes for NetBuilder
Summary: Title.

Reviewed By: dzhulgakov

Differential Revision: D4358227

fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8
2017-01-03 16:59:24 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00