pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

jjsjann123 99e0a87bbb [nvFuser] Latency improvements for pointwise + reduction fusion (#45218 ) Summary: A lot of changes are in this update, some highlights: - Added Doxygen config file - Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR) - Improved latency with dynamic shape handling for the fusion logic - Prevent recompilation for pointwise + reduction fusions when not needed - Improvements to inner dimension reduction performance - Added input -> kernel + kernel launch parameters cache, added eviction policy - Added reduction fusions with multiple outputs (still single reduction stage) - Fixed code generation bugs for symbolic tiled GEMM example - Added thread predicates to prevent shared memory form being loaded multiple times - Improved sync threads placements with shared memory and removed read before write race - Fixes to FP16 reduction fusions where output would come back as FP32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218 Reviewed By: ezyang Differential Revision: D23905183 Pulled By: soumith fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79		2020-09-24 23:17:20 -07:00
..
images	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 )	2020-09-24 23:17:20 -07:00
.gitignore	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 )	2020-09-24 23:17:20 -07:00
documentation.h	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 )	2020-09-24 23:17:20 -07:00
fuser.doxygen	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 )	2020-09-24 23:17:20 -07:00
main_page.md	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 )	2020-09-24 23:17:20 -07:00