Yangqing Jia
|
809d54ee50
|
convnet benchmark minor change
|
2016-01-05 09:55:22 -08:00 |
|
Yangqing Jia
|
8c1bbaa2ab
|
some fill ops that are not tested.
|
2016-01-05 09:55:22 -08:00 |
|
Yangqing Jia
|
6cb2072422
|
cudnn conv op backward compatibility back to v2
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
778a1f6956
|
speed benchmark
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
05eda208a5
|
Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end.
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
896e8e5274
|
pooling backward cudnn, and constant for kOne and kZero.
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
f8585bbf62
|
cudnn pool op.
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
664bdf83d7
|
Pooling refactor so we can do a proper cudnn benchmark.
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
288f350899
|
math_gpu.cu bugfix
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
ebd6c9fab8
|
muji bugfix with ngpu=4
|
2016-01-05 09:55:21 -08:00 |
|
Yangqing Jia
|
55cced894d
|
Some untested half float stuff for benchmarking.
|
2016-01-05 09:49:55 -08:00 |
|
Yangqing Jia
|
8d4683434b
|
convnet benchmark: make it consistent with TF's model.
|
2015-12-17 11:25:51 -08:00 |
|
Yangqing Jia
|
b7c3b48469
|
copy matrix can be done with cudamemcpy.
|
2015-12-17 10:22:02 -08:00 |
|
Yangqing Jia
|
b10ee24fc3
|
conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet.
|
2015-12-17 10:21:16 -08:00 |
|
Yangqing Jia
|
d79cfb4ae7
|
exhaustive search for cudnn
|
2015-12-15 22:21:11 -08:00 |
|
Yangqing Jia
|
61c114971b
|
fast path for copymatrix
|
2015-12-15 22:21:11 -08:00 |
|
Yangqing Jia
|
05e3207e26
|
fast path for copymatrix
|
2015-12-15 21:25:53 -08:00 |
|
Yangqing Jia
|
cc9323793e
|
add relu cudnn code
|
2015-12-15 20:43:34 -08:00 |
|
Yangqing Jia
|
4f2530d8ce
|
expose benchmark code to python
|
2015-12-15 20:42:54 -08:00 |
|
Yangqing Jia
|
6b27cabf17
|
net benchmark code
|
2015-12-15 20:42:22 -08:00 |
|
Yangqing Jia
|
cf8ffe215f
|
minor tuning
|
2015-12-15 20:41:58 -08:00 |
|
Yangqing Jia
|
20ccca5b67
|
RTTI to true in default for the main model.
|
2015-12-15 11:01:09 -08:00 |
|
Yangqing Jia
|
f714ad0a70
|
number of blocks now makes more sense.
|
2015-12-15 10:46:50 -08:00 |
|
Yangqing Jia
|
3b0cc79465
|
context gpu: better error catching
|
2015-12-14 13:59:28 -08:00 |
|
Yangqing Jia
|
73f3daf736
|
minor bugfix for workspace
|
2015-12-13 08:37:36 -08:00 |
|
Yangqing Jia
|
bfae070de1
|
minor bugfix for net
|
2015-12-13 08:37:01 -08:00 |
|
Yangqing Jia
|
359f7685f8
|
halfway into timing test.
|
2015-12-11 11:01:40 -08:00 |
|
Yangqing Jia
|
03c777db72
|
boolean for has_gpu_support
|
2015-12-10 15:06:57 -08:00 |
|
Yangqing Jia
|
7bdc8a6c19
|
Pycaffe2: removed the clunky gpu support hack.
Now, when one builds pycaffe2, if cuda is present, we will always build
pycaffe2 with gpu support.
|
2015-12-10 15:06:57 -08:00 |
|
Yangqing Jia
|
becf9e85c1
|
remove no longer needed build_env_android.py.
|
2015-12-10 15:06:57 -08:00 |
|
Yangqing Jia
|
82696ebc5d
|
Merge pull request #9 from Yangqing/master
a script to test zeromq db throughput.
|
2015-12-09 15:36:39 -08:00 |
|
Yangqing Jia
|
ae1ebd0f19
|
a script to test zeromq db throughput.
|
2015-12-09 15:15:06 -08:00 |
|
Yangqing Jia
|
77541ffe14
|
flags relaxation, or tightening?
|
2015-12-07 20:48:57 -08:00 |
|
Yangqing Jia
|
ceb4cde74a
|
average pooling format change to fit the cudnn interface
|
2015-12-06 15:56:29 -08:00 |
|
Yangqing Jia
|
6bfb30047e
|
deprecate legacy pooling
|
2015-12-06 11:28:00 -08:00 |
|
Yangqing Jia
|
20dbbbbb28
|
android: use full proto in default
|
2015-12-06 11:26:30 -08:00 |
|
Yangqing Jia
|
9022e4f499
|
pull protobuf to master
|
2015-12-05 18:34:48 -08:00 |
|
Yangqing Jia
|
05465783c6
|
optionally use protobuf lite
|
2015-12-05 16:15:00 -08:00 |
|
Yangqing Jia
|
3d7cb201a3
|
misc changes to reduce binary size.
|
2015-12-04 21:31:23 -08:00 |
|
Yangqing Jia
|
4eb486bd34
|
misc update to reduce binary size. Removed zmq.hpp
|
2015-12-03 21:28:55 -08:00 |
|
Yangqing Jia
|
ff04fe8b1b
|
merge
|
2015-12-02 21:41:56 -08:00 |
|
Yangqing Jia
|
1a4ea7c8fc
|
misc updates
|
2015-12-02 21:01:55 -08:00 |
|
Yangqing Jia
|
b64429bbc6
|
Merge branch 'dev' of https://github.com/Yangqing/caffe2 into dev
Conflicts:
caffe2/operators/spatial_batch_norm_op_cudnn.cc
|
2015-12-02 20:57:36 -08:00 |
|
Yangqing Jia
|
25647f8c47
|
more test for tf benchmark purposes.
|
2015-12-02 16:55:51 -08:00 |
|
Yangqing Jia
|
01b45fd052
|
backward support to cudnn R2 for TensorFlow benchmark references
|
2015-12-02 15:12:04 -08:00 |
|
Yangqing Jia
|
acc16645d3
|
temp hack. Will rewrite the build script later.
|
2015-12-02 10:06:15 -08:00 |
|
Yangqing Jia
|
3a4d4285f2
|
Added more benchmarks.
|
2015-12-02 10:04:00 -08:00 |
|
Yangqing Jia
|
7d87fe788f
|
alexnet benchmark code using cudnn: this should give a reference speed that TensorFlow should achieve after tuning. With R4 currently we have 29.5ms fwd / 93.4ms bwd.
|
2015-12-01 17:17:22 -08:00 |
|
Yangqing Jia
|
1499b87e56
|
cudnn spatial bn: optional compilation instead of throwing error
|
2015-12-01 14:20:28 -08:00 |
|
Yangqing Jia
|
5ba54180f5
|
various updates
|
2015-11-28 13:12:43 -08:00 |
|