Commit Graph

16 Commits

Author SHA1 Message Date
Aapo Kyrola
d38499f727 Optimize BlobIsDefined() + benchmark --> net construction 95 secs to 8.2 secs!
Summary:
I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff.

Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets.  This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones.

After the optimization, the net construction drops from 95 secs to 8.2 secs!

Reviewed By: azzolini

Differential Revision: D4288307

fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e
2016-12-15 12:01:30 -08:00
Dmytro Dzhulgakov
3125e6a821 Hacky fix for cloned model rewriting
Summary:
Disclaimer: this is really hacky

Continues a fix from D4218902. The root problem is that DPER builds net incrementally and input_record doesn't support it properly. For not I just manipulate the input record directly. Alisson wants to fix it properly later by allowing set_input_record to accept a superset of current record.

But it should unblock our experimentation.

I'm curious how it's going to look in dper_example world.

Reviewed By: azzolini

Differential Revision: D4255285

fbshipit-source-id: ff65b6f943d705a9b3399035597e2e8ded2e1ff3
2016-12-05 11:53:26 -08:00
Martin Raison
ea9a0f24bf automatic aggregation of sparse gradients
Summary:
This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs).

I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients.

Reviewed By: dzhulgakov

Differential Revision: D4219788

fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a
2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov
119b687994 Allow PythonOp to access the workspace
Summary:
DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine.

Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators.

I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side.

Reviewed By: ajtulloch

Differential Revision: D4242943

fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41
2016-12-05 11:53:26 -08:00
Martin Raison
da72658fa8 sparsehash-based implementation of UniqueOp
Summary:
Faster implementation of UniqueOp using google::dense_hash_map, as suggested by dzhulgakov. I haven't benchmarked it precisely but early measurements with my workflow show a significant speed bump (this operation went from using 20% of overall CPU time down to 7%).

I gated the implementation using the "engine" feature, to avoid adding sparsehash as a dependency to caffe2.

Reviewed By: dzhulgakov

Differential Revision: D4219768

fbshipit-source-id: 2f142981e772105b42fffa24afb199ef816f8e0c
2016-11-29 15:18:39 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Yangqing Jia
0a09d09431 fbsync 2016-09-08 17:56:14 -07:00
Yangqing Jia
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
Yangqing Jia
05512d1e10 sync 2016-08-10 11:02:15 -07:00
Yangqing Jia
c15e45c9bb chunky sync again 2016-08-01 20:58:46 -07:00
Yangqing Jia
bcea409c82 sync 2016-07-28 15:06:43 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
Yangqing Jia
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
Yangqing Jia
cf7ca23fc1 make caffe2.python build 2016-03-08 16:48:19 -08:00
Yangqing Jia
9ae880bb6f move pycaffe2 to caffe2.python 2016-03-08 15:45:30 -08:00