Commit Graph

200 Commits

Author SHA1 Message Date
Yangqing Jia
809d54ee50 convnet benchmark minor change 2016-01-05 09:55:22 -08:00
Yangqing Jia
8c1bbaa2ab some fill ops that are not tested. 2016-01-05 09:55:22 -08:00
Yangqing Jia
6cb2072422 cudnn conv op backward compatibility back to v2 2016-01-05 09:55:21 -08:00
Yangqing Jia
778a1f6956 speed benchmark 2016-01-05 09:55:21 -08:00
Yangqing Jia
05eda208a5 Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end. 2016-01-05 09:55:21 -08:00
Yangqing Jia
896e8e5274 pooling backward cudnn, and constant for kOne and kZero. 2016-01-05 09:55:21 -08:00
Yangqing Jia
f8585bbf62 cudnn pool op. 2016-01-05 09:55:21 -08:00
Yangqing Jia
664bdf83d7 Pooling refactor so we can do a proper cudnn benchmark. 2016-01-05 09:55:21 -08:00
Yangqing Jia
288f350899 math_gpu.cu bugfix 2016-01-05 09:55:21 -08:00
Yangqing Jia
ebd6c9fab8 muji bugfix with ngpu=4 2016-01-05 09:55:21 -08:00
Yangqing Jia
55cced894d Some untested half float stuff for benchmarking. 2016-01-05 09:49:55 -08:00
Yangqing Jia
8d4683434b convnet benchmark: make it consistent with TF's model. 2015-12-17 11:25:51 -08:00
Yangqing Jia
b7c3b48469 copy matrix can be done with cudamemcpy. 2015-12-17 10:22:02 -08:00
Yangqing Jia
b10ee24fc3 conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet. 2015-12-17 10:21:16 -08:00
Yangqing Jia
d79cfb4ae7 exhaustive search for cudnn 2015-12-15 22:21:11 -08:00
Yangqing Jia
61c114971b fast path for copymatrix 2015-12-15 22:21:11 -08:00
Yangqing Jia
05e3207e26 fast path for copymatrix 2015-12-15 21:25:53 -08:00
Yangqing Jia
cc9323793e add relu cudnn code 2015-12-15 20:43:34 -08:00
Yangqing Jia
4f2530d8ce expose benchmark code to python 2015-12-15 20:42:54 -08:00
Yangqing Jia
6b27cabf17 net benchmark code 2015-12-15 20:42:22 -08:00
Yangqing Jia
cf8ffe215f minor tuning 2015-12-15 20:41:58 -08:00
Yangqing Jia
20ccca5b67 RTTI to true in default for the main model. 2015-12-15 11:01:09 -08:00
Yangqing Jia
f714ad0a70 number of blocks now makes more sense. 2015-12-15 10:46:50 -08:00
Yangqing Jia
3b0cc79465 context gpu: better error catching 2015-12-14 13:59:28 -08:00
Yangqing Jia
73f3daf736 minor bugfix for workspace 2015-12-13 08:37:36 -08:00
Yangqing Jia
bfae070de1 minor bugfix for net 2015-12-13 08:37:01 -08:00
Yangqing Jia
359f7685f8 halfway into timing test. 2015-12-11 11:01:40 -08:00
Yangqing Jia
03c777db72 boolean for has_gpu_support 2015-12-10 15:06:57 -08:00
Yangqing Jia
7bdc8a6c19 Pycaffe2: removed the clunky gpu support hack.
Now, when one builds pycaffe2, if cuda is present, we will always build
pycaffe2 with gpu support.
2015-12-10 15:06:57 -08:00
Yangqing Jia
becf9e85c1 remove no longer needed build_env_android.py. 2015-12-10 15:06:57 -08:00
Yangqing Jia
82696ebc5d Merge pull request #9 from Yangqing/master
a script to test zeromq db throughput.
2015-12-09 15:36:39 -08:00
Yangqing Jia
ae1ebd0f19 a script to test zeromq db throughput. 2015-12-09 15:15:06 -08:00
Yangqing Jia
77541ffe14 flags relaxation, or tightening? 2015-12-07 20:48:57 -08:00
Yangqing Jia
ceb4cde74a average pooling format change to fit the cudnn interface 2015-12-06 15:56:29 -08:00
Yangqing Jia
6bfb30047e deprecate legacy pooling 2015-12-06 11:28:00 -08:00
Yangqing Jia
20dbbbbb28 android: use full proto in default 2015-12-06 11:26:30 -08:00
Yangqing Jia
9022e4f499 pull protobuf to master 2015-12-05 18:34:48 -08:00
Yangqing Jia
05465783c6 optionally use protobuf lite 2015-12-05 16:15:00 -08:00
Yangqing Jia
3d7cb201a3 misc changes to reduce binary size. 2015-12-04 21:31:23 -08:00
Yangqing Jia
4eb486bd34 misc update to reduce binary size. Removed zmq.hpp 2015-12-03 21:28:55 -08:00
Yangqing Jia
ff04fe8b1b merge 2015-12-02 21:41:56 -08:00
Yangqing Jia
1a4ea7c8fc misc updates 2015-12-02 21:01:55 -08:00
Yangqing Jia
b64429bbc6 Merge branch 'dev' of https://github.com/Yangqing/caffe2 into dev
Conflicts:
	caffe2/operators/spatial_batch_norm_op_cudnn.cc
2015-12-02 20:57:36 -08:00
Yangqing Jia
25647f8c47 more test for tf benchmark purposes. 2015-12-02 16:55:51 -08:00
Yangqing Jia
01b45fd052 backward support to cudnn R2 for TensorFlow benchmark references 2015-12-02 15:12:04 -08:00
Yangqing Jia
acc16645d3 temp hack. Will rewrite the build script later. 2015-12-02 10:06:15 -08:00
Yangqing Jia
3a4d4285f2 Added more benchmarks. 2015-12-02 10:04:00 -08:00
Yangqing Jia
7d87fe788f alexnet benchmark code using cudnn: this should give a reference speed that TensorFlow should achieve after tuning. With R4 currently we have 29.5ms fwd / 93.4ms bwd. 2015-12-01 17:17:22 -08:00
Yangqing Jia
1499b87e56 cudnn spatial bn: optional compilation instead of throwing error 2015-12-01 14:20:28 -08:00
Yangqing Jia
5ba54180f5 various updates 2015-11-28 13:12:43 -08:00