pytorch/caffe2/utils
Tristan Rice 0c9787c758 caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987

This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb)

For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current.

Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions.

Reviewed By: dzhulgakov

Differential Revision: D23219710

fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814
2020-10-16 16:08:35 -07:00
..
hip Change hip filename extension to .hip (#14036) 2018-11-16 11:55:59 -08:00
math Optimize Scale function (#44913) 2020-09-18 14:31:33 -07:00
threadpool Re-apply PyTorch pthreadpool changes 2020-06-23 19:26:21 -07:00
bench_utils.cc wipe cache with writes (#12279) 2018-10-03 17:12:23 -07:00
bench_utils.h Lightweight at-most-once logging for API usage (#20745) 2019-05-23 23:17:59 -07:00
cast_test.cc Update from facebook (#7855) 2018-05-29 11:38:02 -07:00
cast.h Update from facebook (#7855) 2018-05-29 11:38:02 -07:00
cblas.h Fix more MKL build issues 2017-08-25 14:01:01 -07:00
CMakeLists.txt Fix BUILD_CAFFE2 if FBGEMM and NNPACK are not built (#45610) 2020-10-01 14:58:55 -07:00
conversions.h [caffe2] use Clang identification macro in various places (#33574) 2020-02-20 15:16:11 -08:00
cpu_neon.h [caffe2] Use both __ARM_NEON__ and __ARM_NEON macros (#6697) 2018-04-18 17:45:47 -04:00
cpuid_test.cc Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
cpuid.cc Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
cpuid.h [caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371) 2020-04-10 13:26:34 -07:00
eigen_utils.h Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator 2019-01-17 09:18:08 -08:00
fatal_signal_asan_no_sig_test.cc Windows shared build (#13550) 2018-11-16 12:16:28 -08:00
filler.h Delete Tensor::swap(), replace with pointer swap (#12730) 2019-01-25 08:25:07 -08:00
fixed_divisor_test.cc Enable ROCm multi-gpu with Gloo 2019-05-07 09:55:47 -07:00
fixed_divisor.h [caffe2] use Clang identification macro in various places (#33574) 2020-02-20 15:16:11 -08:00
GpuBitonicSort.cuh Manually applying cudnn5 pull request. 2018-01-02 15:31:33 -08:00
GpuDefs.cuh CUDA RTX30 series support (#45489) 2020-09-29 18:19:23 -07:00
GpuScanUtils.cuh RIP CUDA <9.2: circleci, aten, and caffe2 (#36846) 2020-05-18 13:41:05 -07:00
map_utils.h Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
math_cpu.cc caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987) 2020-10-16 16:08:35 -07:00
math_gpu_test.cc Update math::Transpose to support tensor with size > 2G (#17670) 2019-03-20 18:22:21 -07:00
math_gpu.cu RIP CUDA <9.2: circleci, aten, and caffe2 (#36846) 2020-05-18 13:41:05 -07:00
math_test.cc Update math::Transpose to support tensor with size > 2G (#17670) 2019-03-20 18:22:21 -07:00
math-detail.h Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
math.h Move math::Axpy function to elementwise lib (#18316) 2019-03-26 12:19:19 -07:00
murmur_hash3.cc Remove core and util warnings (#8239) 2018-06-07 09:10:33 -07:00
murmur_hash3.h
proto_convert.cc New serialization format (#12384) 2018-10-16 16:36:58 -07:00
proto_convert.h New serialization format (#12384) 2018-10-16 16:36:58 -07:00
proto_utils_test.cc caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194) 2019-03-22 11:23:03 -07:00
proto_utils.cc [Onnxifi] Don't throw exception when we cannot write out debug files (#45979) 2020-10-08 00:18:24 -07:00
proto_utils.h [Onnxifi] Don't throw exception when we cannot write out debug files (#45979) 2020-10-08 00:18:24 -07:00
proto_wrap.cc New Serialization Proto 2018-09-11 10:55:43 -07:00
proto_wrap.h build changes to make cpu unified build working. (#10504) 2018-08-15 17:22:36 -07:00
signal_handler.cc preprocessor cleanup (#33957) 2020-03-02 13:37:19 -08:00
signal_handler.h Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
simple_queue_test.cc Remove Apache headers from source. 2018-03-27 13:10:18 -07:00
simple_queue.h Fix issues under caffe round 1 2019-01-23 19:04:59 -08:00
smart_tensor_printer_test.cc Kill more weird constructors on Tensor 2018-11-04 16:54:49 -08:00
smart_tensor_printer.cc preprocessor cleanup (#33957) 2020-03-02 13:37:19 -08:00
smart_tensor_printer.h More changes for hidden visibility (#10692) 2018-08-21 13:39:57 -07:00
string_utils.cc BlackBoxPredictor OSS part 5: glow transforms 2019-07-23 16:39:23 -07:00
string_utils.h Fix out-of-boundary access in caffe2::StartsWith (#36672) 2020-04-15 20:40:59 -07:00
zmq_helper.h Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00