pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Derek Kim	1e425d1a47	A trivial typo fix in caffe2.python (#15907 ) Summary: blobl -> globl Pull Request resolved: https://github.com/pytorch/pytorch/pull/15907 Differential Revision: D13709586 Pulled By: ezyang fbshipit-source-id: 9d3ad76b7fea76c7934407d3c164417b4157e234	2019-01-17 04:57:34 -08:00
Tristan Rice	e650a84872	caffe2/python/task: added __repr__ methods to all task definitions (#15250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250 This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook. The default `__repr__` method just returns the object ID which is very unhelpful. Reviewed By: hanli0612 Differential Revision: D13475758 fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29	2018-12-17 16:02:16 -08:00
Hassan Eslami	e392d428b1	Allowing TaskGroups to carry remote nets (#14342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14342 Sometimes, when we are creating a TaskGroup, we are in fact creating a TaskGroup for a distributed job. In some cases, we may want to register a few nets as "remote" to a TaskGroup. The remote net should have sufficient attributes on where they should be executed later on. This diff adds the remote net attribute to the TaskGroup class. It exposes two minimal functionalities: adding a remote net, and getting all remote nets added to a TaskGroup. Reviewed By: d4l3k Differential Revision: D13188320 fbshipit-source-id: efe947aec30817e9512a5e18be985713b9356bdc	2018-11-27 13:34:11 -08:00
Shihao Xu	b834d9107e	Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164 Revert D9566744 Reviewed By: enosair Differential Revision: D9620272 fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02	2018-08-31 22:25:57 -07:00
Shihao Xu	ad1670cf54	Kill the dummy TaskOutput when task.get_step() (#11048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9566744 fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af	2018-08-29 20:11:29 -07:00
Zhanibek Datbayev	22e3b2c9c3	Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() Differential Revision: D9413150 Original commit changeset: 51aaf3201e26 fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a	2018-08-29 14:39:49 -07:00
Shihao Xu	6ca28984c7	Kill the dummy TaskOutput when task.get_step() (#10739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9413150 fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac	2018-08-28 20:41:46 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Lei Chen	14950a9082	Support session in distributed realtime trainer Summary: Convert from PlanDef ProtoBuf into python Plan object by recursively creating Nets and ExecutionSteps. Also support running Plan object directly in Session. Reviewed By: azzolini Differential Revision: D5608393 fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c	2017-08-16 10:28:55 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Alisson Gusatti Azzolini	7d482742fd	Allow tasks/execution_steps to be cloned at runtime Summary: Advantages of cloning the tasks/execution_steps at runtime: - Less complexity on the python side: no need to clone nets and add prefixes to blob names - Faster start-up: we had cases of complex plans that took up to 30min to be created. - Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs. - Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime. Reviewed By: dzhulgakov Differential Revision: D5100730 fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc	2017-06-20 22:32:07 -07:00
Xiaolong Wang	b133c214ce	fix potential bug in task.py Summary: as titled Differential Revision: D5225166 fbshipit-source-id: 9247fe44922c097752c6996ee9192ec72b7e7d88	2017-06-11 10:40:47 -07:00
Xiaolong Wang	827a0ac2fe	Fix comment mistakes in task.py Summary: as titled Reviewed By: kennyhorror Differential Revision: D5225154 fbshipit-source-id: 99a9547e15e0d5a4c81b6339ce75406160a7fc07	2017-06-11 10:17:07 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Alisson Gusatti Azzolini	310f505da7	Remove application-specific comment. Summary: This comment is not relevant for open-source. Differential Revision: D5070835 fbshipit-source-id: 8e2dadae85566e7f6684d42f921daf7d345dc065	2017-05-16 12:17:03 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Alisson Gusatti Azzolini	59f0454621	Gather perf counters for distributed jobs Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once. Reviewed By: dzhulgakov Differential Revision: D4555116 fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a	2017-02-21 22:06:25 -08:00
Alisson Gusatti Azzolini	6ff05fd49d	Fix issues pickling jobs Summary: We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session. This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling. Reviewed By: dzhulgakov Differential Revision: D4554799 fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984	2017-02-21 20:47:27 -08:00
Alisson Gusatti Azzolini	8fa156d082	Improve "reporter net" design Summary: Previously we had several limitations for a reporter net: - needed to be a net, not an execution step - only one allowed per execution step, with a single interval Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts. Reviewed By: dzhulgakov Differential Revision: D4583686 fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d	2017-02-21 20:17:40 -08:00
Dmytro Dzhulgakov	335b73221c	Unify train_local and train_with_distributed_readers Summary: Outline of changes: - add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions) - because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case - after that it's easy to unify code even more - there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks) Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification. I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing. Reviewed By: xianjiec Differential Revision: D4526079 fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72	2017-02-09 20:46:35 -08:00
Alisson Gusatti Azzolini	039ac56a68	Better names for nets, steps and tasks Summary: - NetBuilder now honors its name - When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix - When a NetBuilder is created in the context of a Task, it takes the Tasks's name. - pipe() now tries to find a good name based on its processor's, output or input queue's name. - RPC tries to find a name from its handler's name. - Better names in DataStream - net_printer prints the name of Tasks and Steps - net_printer optionally factors out common prefixes form blob names. Differential Revision: D4527578 fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217	2017-02-09 16:33:54 -08:00
Alisson Gusatti Azzolini	33c0e5619b	Add Task.REPORT_NET attribute Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon) Reviewed By: dzhulgakov Differential Revision: D4497771 fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857	2017-02-03 18:44:50 -08:00
Alisson Gusatti Azzolini	1d3834eeb2	Nodes to support resource requirements and outputs Summary: See distributed.py for example of usage Reviewed By: xianjiec Differential Revision: D4467723 fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd	2017-01-30 11:29:25 -08:00
Alisson Gusatti Azzolini	6618d7462d	Improvements+fixes for NetBuilder Summary: Title. Reviewed By: dzhulgakov Differential Revision: D4358227 fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8	2017-01-03 16:59:24 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00

27 Commits