pytorch/caffe2/perfkernels
haozhe.zhu 7cd6e6acad add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198)
Add BF16 in FP32 out kernel into Caffe2 emb perfkernels. And also update the python code-gen files to generate the kernel.
The ut will be covered in the next PR(#89199) in this stack ( Tested by nn.EmbeddingBag with BF16 data type)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89198
Approved by: https://github.com/jgong5, https://github.com/kit1980
2022-11-30 13:06:13 +00:00
..
__init__.py
adagrad_avx2.cc
adagrad.cc
adagrad.h
batch_box_cox_avx2.cc [tourch] BatchBoxCox - fix numerical issue in vectorized code (#88875) 2022-11-11 21:58:23 +00:00
batch_box_cox.cc [torch] Unify batch_box_cox implementations into perfkernels folder (#86569) 2022-10-23 19:29:25 +00:00
batch_box_cox.h [torch] Unify batch_box_cox implementations into perfkernels folder (#86569) 2022-10-23 19:29:25 +00:00
CMakeLists.txt Remove caffe2 mobile (#84338) 2022-09-08 01:49:55 +00:00
common_avx.cc
common_avx2.cc
common_avx512.cc
common.h [torch] Unify batch_box_cox implementations into perfkernels folder (#86569) 2022-10-23 19:29:25 +00:00
cvtsh_ss_bugfix.h
embedding_lookup_avx2.cc
embedding_lookup_fused_8bit_rowwise_avx2.cc
embedding_lookup_fused_8bit_rowwise_idx_avx2.cc
embedding_lookup_idx_avx2.cc add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198) 2022-11-30 13:06:13 +00:00
embedding_lookup_idx.cc add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198) 2022-11-30 13:06:13 +00:00
embedding_lookup_idx.h
embedding_lookup.cc
embedding_lookup.h
fused_8bit_rowwise_embedding_lookup_idx.cc
fused_8bit_rowwise_embedding_lookup_idx.h
fused_8bit_rowwise_embedding_lookup.cc
fused_8bit_rowwise_embedding_lookup.h
fused_nbit_rowwise_conversion.cc
fused_nbit_rowwise_conversion.h
hp_emblookup_codegen.py add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198) 2022-11-30 13:06:13 +00:00
lstm_unit_cpu_avx2.cc
lstm_unit_cpu_common.cc
lstm_unit_cpu_common.h
lstm_unit_cpu-impl.h [caffe2][tourch] Optimize BatchBoxCox (#87585) 2022-11-10 06:11:05 +00:00
lstm_unit_cpu.h
math_cpu_avx2.cc
math_cpu_base.cc
math.h
typed_axpy_avx.cc
typed_axpy_avx2.cc
typed_axpy.cc
typed_axpy.h
vectorizer.h [caffe2][tourch] Optimize BatchBoxCox (#87585) 2022-11-10 06:11:05 +00:00