pytorch/caffe2/python
Aapo Kyrola 8421bf7c60 Faster softmaxWithLoss rowMaxKernel
Summary:
We did not parallelize over D, which can be very large, especially in RNN models. This speeds up significantly, with my quick test in lstm_benchmark and nvprof, the time of RowMaxKernel dropped from 1.2s total to 0.28s total.

+ addded softmaxwithloss to the lstm_benchmark

Reviewed By: jamesr66a

Differential Revision: D4800629

fbshipit-source-id: 3400ea1064b1eb2793bc403df2c1b68801d545e5
2017-03-30 15:49:46 -07:00
..
docs doxygen python block added 2017-03-29 06:46:16 -07:00
examples doxygen python block added 2017-03-29 06:46:16 -07:00
layers support multilabel in generic preprocessor 2017-03-29 15:20:54 -07:00
mint doxygen python block added 2017-03-29 06:46:16 -07:00
models doxygen python block added 2017-03-29 06:46:16 -07:00
operator_test Faster softmaxWithLoss rowMaxKernel 2017-03-30 15:49:46 -07:00
_import_c_extension.py doxygen python block added 2017-03-29 06:46:16 -07:00
attention.py doxygen python block added 2017-03-29 06:46:16 -07:00
caffe_translator_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
caffe_translator.py doxygen python block added 2017-03-29 06:46:16 -07:00
checkpoint_test.py Fix flaky test 2017-03-29 16:48:20 -07:00
checkpoint.py Fix flaky test 2017-03-29 16:48:20 -07:00
CMakeLists.txt CMake completions work 2017-01-11 16:59:22 -08:00
cnn.py doxygen python block added 2017-03-29 06:46:16 -07:00
context_test.py Make ContextManager thread-safe 2017-02-13 19:45:35 -08:00
context.py doxygen python block added 2017-03-29 06:46:16 -07:00
control_test.py fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
control.py doxygen python block added 2017-03-29 06:46:16 -07:00
convnet_benchmarks_test.py chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
convnet_benchmarks.py doxygen python block added 2017-03-29 06:46:16 -07:00
core_gradients_test.py add inference for gradient ops + a couple of missing shape inference functions + fix to scalars 2017-02-28 23:33:32 -08:00
core_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
core.py doxygen python block added 2017-03-29 06:46:16 -07:00
crf.py doxygen python block added 2017-03-29 06:46:16 -07:00
data_parallel_model_test.py fixes to make data parallel model work for RecurrentNet + test case 2017-03-14 15:48:07 -07:00
data_parallel_model.py doxygen python block added 2017-03-29 06:46:16 -07:00
data_workers_test.py close blobs queues when stopping + test 2017-02-27 10:07:57 -08:00
data_workers.py doxygen python block added 2017-03-29 06:46:16 -07:00
dataio_test.py Stop multi_reader if we run out of data before max_examples 2017-03-10 18:03:57 -08:00
dataio.py doxygen python block added 2017-03-29 06:46:16 -07:00
dataset.py doxygen python block added 2017-03-29 06:46:16 -07:00
db_test.py Fix db_test under tsan 2016-11-29 15:18:37 -08:00
device_checker.py doxygen python block added 2017-03-29 06:46:16 -07:00
dyndep.py doxygen python block added 2017-03-29 06:46:16 -07:00
experiment_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
extension_loader.py Make extension loader properly handle visibility. 2017-03-30 14:38:38 -07:00
gradient_check_test.py gradient checker for nets 2017-03-28 13:03:14 -07:00
gradient_checker.py doxygen python block added 2017-03-29 06:46:16 -07:00
hsm_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test.py Fixes for ops without a CUDA backend 2017-03-29 14:36:09 -07:00
layer_model_helper.py doxygen python block added 2017-03-29 06:46:16 -07:00
layer_model_instantiator.py doxygen python block added 2017-03-29 06:46:16 -07:00
layer_test_util.py uniform_sampling layer 2017-03-29 14:36:12 -07:00
layers_test.py uniform_sampling layer 2017-03-29 14:36:12 -07:00
load_save_test.py Improve error message from LogFileDB on missing file 2017-03-10 23:31:28 -08:00
lstm_benchmark.py Faster softmaxWithLoss rowMaxKernel 2017-03-30 15:49:46 -07:00
memonger_test.py Gradient Input memory sharing using memonger blob sharing 2017-01-09 19:44:23 -08:00
memonger.py doxygen python block added 2017-03-29 06:46:16 -07:00
mkl_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
model_device_test.py Comment out NHWC Alexnet test for now 2017-01-23 13:59:29 -08:00
model_helper.py check for ExtractPredictorNet for is_test arguments 2017-03-29 12:48:54 -07:00
mpi_python.cc Move mpi_python.cc to the python folder to be more consistent about source file locations. 2017-01-09 10:59:39 -08:00
muji_test.py chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
muji.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_builder_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
net_builder.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_drawer.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_printer_test.py Debug/Analysis tools for Jobs/ExecutionSteps 2017-02-06 17:31:20 -08:00
net_printer.py doxygen python block added 2017-03-29 06:46:16 -07:00
optimizer_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
optimizer_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
optimizer.py doxygen python block added 2017-03-29 06:46:16 -07:00
pipeline.py doxygen python block added 2017-03-29 06:46:16 -07:00
pybind_state_gpu.cc Cudnn v6 2017-02-28 17:46:33 -08:00
pybind_state_mkl.cc Expose MKLMemory to the Python Feed and Fetch interface, and misc changes 2016-11-29 15:18:36 -08:00
pybind_state.cc Protobuf is binary string. Use bytes instead. 2017-03-28 19:03:23 -07:00
pybind_state.h bugfix for Windows, esp. VS 2017 2017-03-21 05:17:59 -07:00
python_op_test.py Allow PythonOp to access the workspace 2016-12-05 11:53:26 -08:00
queue_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
record_queue.py doxygen python block added 2017-03-29 06:46:16 -07:00
recurrent.py doxygen python block added 2017-03-29 06:46:16 -07:00
schema_test.py Struct nested field name lookup supports List 2017-03-24 18:17:19 -07:00
schema.py doxygen python block added 2017-03-29 06:46:16 -07:00
scope_test.py fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
scope.py doxygen python block added 2017-03-29 06:46:16 -07:00
session_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
session.py doxygen python block added 2017-03-29 06:46:16 -07:00
sparse_to_dense_mask_test.py Fix few more operators to handle empty batches correctly. 2016-11-29 15:18:37 -08:00
task.py doxygen python block added 2017-03-29 06:46:16 -07:00
test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
text_file_reader.py doxygen python block added 2017-03-29 06:46:16 -07:00
timeout_guard.py doxygen python block added 2017-03-29 06:46:16 -07:00
toy_regression_test.py sync 2016-08-10 11:02:15 -07:00
tt_core_test.py sync 2016-08-10 11:02:15 -07:00
tt_core.py doxygen python block added 2017-03-29 06:46:16 -07:00
utils.py doxygen python block added 2017-03-29 06:46:16 -07:00
visualize.py doxygen python block added 2017-03-29 06:46:16 -07:00
workspace_test.py Added predictor bindings to python interface 2017-03-15 11:17:54 -07:00
workspace.py doxygen python block added 2017-03-29 06:46:16 -07:00