Commit Graph

5 Commits

Author SHA1 Message Date
Orion Reblitz-Richardson
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Junjie Bai
4fddc04054 Use the same schema of switching to device reduce sum for SumSqrElements
Summary: Based on benchmark script located at `caffe2/experiments/python/device_reduce_sum_bench.py`, device reduce sum is slower for N <= 10000, so we only switch to use device reduce for large N in SumElements. This diff applies the same schema for SumSqrElements.

Reviewed By: jamesr66a

Differential Revision: D5369868

fbshipit-source-id: ae13a611aff9d3464d1c4950ee155c740a2da339
2017-07-05 10:52:17 -07:00
Junjie Bai
f3a59aedff Use cub::DeviceReduce for faster math::Sum CUDA version
Summary: Port SumElements and softmax_ops.cu to use device reduce sum

Reviewed By: akyrola

Differential Revision: D5351881

fbshipit-source-id: ca9604186c261ffcb1480da2a17baab8a4809372
2017-06-30 15:04:06 -07:00