pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Qinqing Zheng	90586d925f	[DT] [38/n] Rename add_stop_signal to add_stop_condition (#6825 ) att	2018-04-23 10:39:37 -07:00
Qinqing Zheng	038b66ee07	[caffe2] use dictionary in Printer (#6443 )	2018-04-10 10:37:07 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Wei Zhang	1d4e996b87	Separate parameter downloading tasks from training tasks and run them in a different group Summary: At the end of distributed training, trainer needs to download the parameters back from parameter servers for saving the model. Currently, this parameter downloading happens at the end of job's epoch task group, which creates several problems when checkpointing is enabled for distributed training: 1. When checkpointing is enabled, we run multiple training epochs. At the end of each epoch, the model download tasks will run to collect parameters, but we won't save the model until the true end of training, so there is a big waste of resource. 2. After trainer0 downloads the parameters, these parameters take a lot of memory, so trainer0 can easily run out of memory in the next epoch of training. Our solution is to insert a parameter download task group between the job's training epoch_group and the job's exit_group. Reviewed By: azzolini Differential Revision: D6765393 fbshipit-source-id: 5a4f556fc3c1cd7834a7c406a3c0de3fccd50c49	2018-01-22 14:04:12 -08:00
Alisson Gusatti Azzolini	4e3aa25139	Unit test that compares net snippets after parallelization Summary: - This is meant as a set of examples on how parallelize_net works. - Currently, only one example is provided. More to be added. Reviewed By: mraway, xianjiec Differential Revision: D6240160 fbshipit-source-id: 6f6f2d77445825883e050498cb6e06fb74508bbf	2017-11-08 15:55:27 -08:00
Alisson Gusatti Azzolini	45c5ac1415	Print net type arguments in net_printer Summary: This prints the inner net of 'Do' op, for example. Reviewed By: akyrola Differential Revision: D6007278 fbshipit-source-id: 459583fe13191b0449982efb7be733c9c01ecf76	2017-10-08 20:02:55 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Alisson Gusatti Azzolini	1968e03486	net_printer.to_string() accepts NetDef Summary: Title. Reviewed By: kennyhorror Differential Revision: D5531925 fbshipit-source-id: 8f8961e6ab14d49720f74ec01c197ba9cc3e33ce	2017-08-01 10:17:29 -07:00
Dmytro Dzhulgakov	67d2f45e2f	Fix net_printer.py Summary: Fix the unprintable characters fix :) Reviewed By: akyrola Differential Revision: D5398914 fbshipit-source-id: 2c607c497f15e324e863ff1dae7bb16199d4074e	2017-07-11 15:26:52 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Alisson Gusatti Azzolini	7d482742fd	Allow tasks/execution_steps to be cloned at runtime Summary: Advantages of cloning the tasks/execution_steps at runtime: - Less complexity on the python side: no need to clone nets and add prefixes to blob names - Faster start-up: we had cases of complex plans that took up to 30min to be created. - Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs. - Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime. Reviewed By: dzhulgakov Differential Revision: D5100730 fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc	2017-06-20 22:32:07 -07:00
Yiming Wu	072f4dbefc	net_printer_quick_fix Summary: To deal with encode failure Reviewed By: azzolini Differential Revision: D5215897 fbshipit-source-id: cf8687706f7e4deaee05b61cd2bfeaff88672fcc	2017-06-08 19:34:50 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Dmytro Dzhulgakov	560572910c	Add task outputs and stop signals to net_printer Summary: Useful for debugging of multi_reader. Reviewed By: kennyhorror Differential Revision: D4664954 fbshipit-source-id: ba7a307db444b61a7e520992ee44c35237906068	2017-03-07 01:21:40 -08:00
Alisson Gusatti Azzolini	8fa156d082	Improve "reporter net" design Summary: Previously we had several limitations for a reporter net: - needed to be a net, not an execution step - only one allowed per execution step, with a single interval Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts. Reviewed By: dzhulgakov Differential Revision: D4583686 fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d	2017-02-21 20:17:40 -08:00
Alisson Gusatti Azzolini	039ac56a68	Better names for nets, steps and tasks Summary: - NetBuilder now honors its name - When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix - When a NetBuilder is created in the context of a Task, it takes the Tasks's name. - pipe() now tries to find a good name based on its processor's, output or input queue's name. - RPC tries to find a name from its handler's name. - Better names in DataStream - net_printer prints the name of Tasks and Steps - net_printer optionally factors out common prefixes form blob names. Differential Revision: D4527578 fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217	2017-02-09 16:33:54 -08:00
Alisson Gusatti Azzolini	17151ca14f	Debug/Analysis tools for Jobs/ExecutionSteps Summary: Introduces 2 utitilies: - ##print_obj##: Prints the whole Job in a nice way -- each op call takes one single line and nets are inlined for much better readability. Loops and parallel steps are easy to read. - ##analyse_obj##: Goes through a Job and checks 2 things: - that there will be no undefined blob errors at execution. - no blob of same name will be created by parallel execution steps Reviewed By: dzhulgakov Differential Revision: D4142381 fbshipit-source-id: 61bf3398c22e9947493e99145ce2bfc2646830a6	2017-02-06 17:31:20 -08:00

18 Commits