mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Code generator for and high-performance emebding look-up kernels, supporting Sum, WeightedSum, and Mean reducers. Achieve at least 1.5x speedup on float and over 2x speedup for float16, compared to existing code These are results on Broadwell, using sparse_lengths_sum_benchmar.par benchmark Old ============== [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 Preparing lookup table. 2017-08-08 00:10:23.101848 Preparation finished. 2017-08-08 00:10:27.955680 I0808 00:10:27.955732 30700 net.cc:177] Starting benchmark. I0808 00:10:27.955759 30700 net.cc:178] Running warmup runs. I0808 00:10:27.956367 30700 net.cc:188] Main runs. I0808 00:10:31.839035 30700 net.cc:199] Main run finished. Milliseconds per iter: 0.388264. Iters per second: 2575.56 I0808 00:10:35.704169 30700 net.cc:233] Operator #0 (indices, Python) 0.0583264 ms/iter I0808 00:10:35.704210 30700 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.327694 ms/iter I0808 00:10:35.704213 30700 net.cc:237] Time per operator type: I0808 00:10:35.704217 30700 net.cc:246] 0.327694 SparseLengthsSum I0808 00:10:35.704221 30700 net.cc:246] 0.0583264 Python [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 --dtype float16 Preparing lookup table. 2017-08-08 00:10:59.047159 Preparation finished. 2017-08-08 00:11:05.140565 I0808 00:11:05.140612 31725 net.cc:177] Starting benchmark. I0808 00:11:05.140635 31725 net.cc:178] Running warmup runs. I0808 00:11:05.141104 31725 net.cc:188] Main runs. I0808 00:11:08.371510 31725 net.cc:199] Main run finished. Milliseconds per iter: 0.323039. Iters per second: 3095.6 I0808 00:11:11.671450 31725 net.cc:233] Operator #0 (indices, Python) 0.0609876 ms/iter I0808 00:11:11.671489 31725 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.26856 ms/iter I0808 00:11:11.671494 31725 net.cc:237] Time per operator type: I0808 00:11:11.671497 31725 net.cc:246] 0.26856 SparseLengthsSum I0808 00:11:11.671500 31725 net.cc:246] 0.0609876 Python New (Misha's) ============== [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 Preparing lookup table. 2017-08-07 23:44:55.897748 Preparation finished. 2017-08-07 23:45:00.708896 I0807 23:45:00.708945 4178361 net.cc:177] Starting benchmark. I0807 23:45:00.708971 4178361 net.cc:178] Running warmup runs. I0807 23:45:00.709444 4178361 net.cc:188] Main runs. I0807 23:45:03.608551 4178361 net.cc:199] Main run finished. Milliseconds per iter: 0.289909. Iters per second: 3449.36 I0807 23:45:06.536182 4178361 net.cc:233] Operator #0 (indices, Python) 0.0572399 ms/iter I0807 23:45:06.536224 4178361 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.23512 ms/iter I0807 23:45:06.536228 4178361 net.cc:237] Time per operator type: I0807 23:45:06.536232 4178361 net.cc:246] 0.23512 SparseLengthsSum I0807 23:45:06.536236 4178361 net.cc:246] 0.0572399 Python [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 --dtype float16 Preparing lookup table. 2017-08-07 23:45:17.191579 Preparation finished. 2017-08-07 23:45:23.173668 I0807 23:45:23.173715 4179316 net.cc:177] Starting benchmark. I0807 23:45:23.173743 4179316 net.cc:178] Running warmup runs. I0807 23:45:23.174090 4179316 net.cc:188] Main runs. I0807 23:45:24.939749 4179316 net.cc:199] Main run finished. Milliseconds per iter: 0.176564. Iters per second: 5663.67 I0807 23:45:26.698885 4179316 net.cc:233] Operator #0 (indices, Python) 0.0557303 ms/iter I0807 23:45:26.698923 4179316 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.119794 ms/iter I0807 23:45:26.698927 4179316 net.cc:237] Time per operator type: I0807 23:45:26.698931 4179316 net.cc:246] 0.119794 SparseLengthsSum I0807 23:45:26.698935 4179316 net.cc:246] 0.0557303 Python Reviewed By: salexspb Differential Revision: D5582172 fbshipit-source-id: d71f5a55580b734a51b8f30852b75f379acfdaf2 |
||
|---|---|---|
| .. | ||
| docs | ||
| examples | ||
| helpers | ||
| layers | ||
| mint | ||
| mkl | ||
| modeling | ||
| models | ||
| operator_test | ||
| predictor | ||
| rnn | ||
| _import_c_extension.py | ||
| allcompare_test.py | ||
| attention.py | ||
| binarysize.py | ||
| brew_test.py | ||
| brew.py | ||
| build.py | ||
| caffe_translator_test.py | ||
| caffe_translator.py | ||
| checkpoint_test.py | ||
| checkpoint.py | ||
| CMakeLists.txt | ||
| cnn.py | ||
| context_test.py | ||
| context.py | ||
| control_ops_util.py | ||
| control_test.py | ||
| control.py | ||
| convnet_benchmarks_test.py | ||
| convnet_benchmarks.py | ||
| core_gradients_test.py | ||
| core_test.py | ||
| core.py | ||
| crf.py | ||
| data_parallel_model_test.py | ||
| data_parallel_model.py | ||
| data_workers_test.py | ||
| data_workers.py | ||
| dataio_test.py | ||
| dataio.py | ||
| dataset.py | ||
| db_test.py | ||
| device_checker.py | ||
| dyndep.py | ||
| embedding_generation_benchmark.py | ||
| empty.so | ||
| experiment_util.py | ||
| extension_loader.py | ||
| gradient_check_test.py | ||
| gradient_checker.py | ||
| gru_cell.py | ||
| hsm_util.py | ||
| hypothesis_test_util.py | ||
| hypothesis_test.py | ||
| layer_model_helper.py | ||
| layer_model_instantiator.py | ||
| layer_parameter_sharing_test.py | ||
| layer_test_util.py | ||
| layers_test.py | ||
| load_save_test.py | ||
| lstm_benchmark.py | ||
| memonger_test.py | ||
| memonger.py | ||
| mkl_test_util.py | ||
| model_device_test.py | ||
| model_helper.py | ||
| mpi_python.cc | ||
| muji_test.py | ||
| muji.py | ||
| net_builder_test.py | ||
| net_builder.py | ||
| net_drawer.py | ||
| net_printer_test.py | ||
| net_printer.py | ||
| optimizer_context.py | ||
| optimizer_test_util.py | ||
| optimizer_test.py | ||
| optimizer.py | ||
| parallel_workers_test.py | ||
| parallel_workers.py | ||
| parallelize_gpu_bmuf_distributed_test.py | ||
| pipeline.py | ||
| predictor_constants.py | ||
| pybind_state_gpu.cc | ||
| pybind_state_mkl.cc | ||
| pybind_state.cc | ||
| pybind_state.h | ||
| python_op_test.py | ||
| queue_util.py | ||
| record_queue.py | ||
| recurrent.py | ||
| rnn_cell.py | ||
| schema_test.py | ||
| schema.py | ||
| scope_test.py | ||
| scope.py | ||
| session_test.py | ||
| session.py | ||
| sparse_to_dense_mask_test.py | ||
| task.py | ||
| test_util.py | ||
| text_file_reader.py | ||
| timeout_guard.py | ||
| toy_regression_test.py | ||
| tt_core_test.py | ||
| tt_core.py | ||
| utils.py | ||
| visualize.py | ||
| workspace_test.py | ||
| workspace.py | ||