pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yangqing Jia	809d54ee50	convnet benchmark minor change	2016-01-05 09:55:22 -08:00
Yangqing Jia	8c1bbaa2ab	some fill ops that are not tested.	2016-01-05 09:55:22 -08:00
Yangqing Jia	6cb2072422	cudnn conv op backward compatibility back to v2	2016-01-05 09:55:21 -08:00
Yangqing Jia	778a1f6956	speed benchmark	2016-01-05 09:55:21 -08:00
Yangqing Jia	05eda208a5	Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end.	2016-01-05 09:55:21 -08:00
Yangqing Jia	896e8e5274	pooling backward cudnn, and constant for kOne and kZero.	2016-01-05 09:55:21 -08:00
Yangqing Jia	f8585bbf62	cudnn pool op.	2016-01-05 09:55:21 -08:00
Yangqing Jia	664bdf83d7	Pooling refactor so we can do a proper cudnn benchmark.	2016-01-05 09:55:21 -08:00
Yangqing Jia	288f350899	math_gpu.cu bugfix	2016-01-05 09:55:21 -08:00
Yangqing Jia	ebd6c9fab8	muji bugfix with ngpu=4	2016-01-05 09:55:21 -08:00
Yangqing Jia	55cced894d	Some untested half float stuff for benchmarking.	2016-01-05 09:49:55 -08:00
Yangqing Jia	8d4683434b	convnet benchmark: make it consistent with TF's model.	2015-12-17 11:25:51 -08:00
Yangqing Jia	b7c3b48469	copy matrix can be done with cudamemcpy.	2015-12-17 10:22:02 -08:00
Yangqing Jia	b10ee24fc3	conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet.	2015-12-17 10:21:16 -08:00
Yangqing Jia	d79cfb4ae7	exhaustive search for cudnn	2015-12-15 22:21:11 -08:00
Yangqing Jia	61c114971b	fast path for copymatrix	2015-12-15 22:21:11 -08:00
Yangqing Jia	05e3207e26	fast path for copymatrix	2015-12-15 21:25:53 -08:00
Yangqing Jia	cc9323793e	add relu cudnn code	2015-12-15 20:43:34 -08:00
Yangqing Jia	4f2530d8ce	expose benchmark code to python	2015-12-15 20:42:54 -08:00
Yangqing Jia	6b27cabf17	net benchmark code	2015-12-15 20:42:22 -08:00
Yangqing Jia	cf8ffe215f	minor tuning	2015-12-15 20:41:58 -08:00
Yangqing Jia	20ccca5b67	RTTI to true in default for the main model.	2015-12-15 11:01:09 -08:00
Yangqing Jia	f714ad0a70	number of blocks now makes more sense.	2015-12-15 10:46:50 -08:00
Yangqing Jia	3b0cc79465	context gpu: better error catching	2015-12-14 13:59:28 -08:00
Yangqing Jia	73f3daf736	minor bugfix for workspace	2015-12-13 08:37:36 -08:00
Yangqing Jia	bfae070de1	minor bugfix for net	2015-12-13 08:37:01 -08:00
Yangqing Jia	359f7685f8	halfway into timing test.	2015-12-11 11:01:40 -08:00
Yangqing Jia	03c777db72	boolean for has_gpu_support	2015-12-10 15:06:57 -08:00
Yangqing Jia	7bdc8a6c19	Pycaffe2: removed the clunky gpu support hack. Now, when one builds pycaffe2, if cuda is present, we will always build pycaffe2 with gpu support.	2015-12-10 15:06:57 -08:00
Yangqing Jia	becf9e85c1	remove no longer needed build_env_android.py.	2015-12-10 15:06:57 -08:00
Yangqing Jia	82696ebc5d	Merge pull request #9 from Yangqing/master a script to test zeromq db throughput.	2015-12-09 15:36:39 -08:00
Yangqing Jia	ae1ebd0f19	a script to test zeromq db throughput.	2015-12-09 15:15:06 -08:00
Yangqing Jia	77541ffe14	flags relaxation, or tightening?	2015-12-07 20:48:57 -08:00
Yangqing Jia	ceb4cde74a	average pooling format change to fit the cudnn interface	2015-12-06 15:56:29 -08:00
Yangqing Jia	6bfb30047e	deprecate legacy pooling	2015-12-06 11:28:00 -08:00
Yangqing Jia	20dbbbbb28	android: use full proto in default	2015-12-06 11:26:30 -08:00
Yangqing Jia	9022e4f499	pull protobuf to master	2015-12-05 18:34:48 -08:00
Yangqing Jia	05465783c6	optionally use protobuf lite	2015-12-05 16:15:00 -08:00
Yangqing Jia	3d7cb201a3	misc changes to reduce binary size.	2015-12-04 21:31:23 -08:00
Yangqing Jia	4eb486bd34	misc update to reduce binary size. Removed zmq.hpp	2015-12-03 21:28:55 -08:00
Yangqing Jia	ff04fe8b1b	merge	2015-12-02 21:41:56 -08:00
Yangqing Jia	1a4ea7c8fc	misc updates	2015-12-02 21:01:55 -08:00
Yangqing Jia	b64429bbc6	Merge branch 'dev' of https://github.com/Yangqing/caffe2 into dev Conflicts: caffe2/operators/spatial_batch_norm_op_cudnn.cc	2015-12-02 20:57:36 -08:00
Yangqing Jia	25647f8c47	more test for tf benchmark purposes.	2015-12-02 16:55:51 -08:00
Yangqing Jia	01b45fd052	backward support to cudnn R2 for TensorFlow benchmark references	2015-12-02 15:12:04 -08:00
Yangqing Jia	acc16645d3	temp hack. Will rewrite the build script later.	2015-12-02 10:06:15 -08:00
Yangqing Jia	3a4d4285f2	Added more benchmarks.	2015-12-02 10:04:00 -08:00
Yangqing Jia	7d87fe788f	alexnet benchmark code using cudnn: this should give a reference speed that TensorFlow should achieve after tuning. With R4 currently we have 29.5ms fwd / 93.4ms bwd.	2015-12-01 17:17:22 -08:00
Yangqing Jia	1499b87e56	cudnn spatial bn: optional compilation instead of throwing error	2015-12-01 14:20:28 -08:00
Yangqing Jia	5ba54180f5	various updates	2015-11-28 13:12:43 -08:00

1 2 3 4

200 Commits