mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb |
||
|---|---|---|
| .. | ||
| bf16_support.cu | ||
| block_reduction.cu | ||
| block_sync_atomic.cu | ||
| block_sync_default.cu | ||
| broadcast.cu | ||
| fp16_support.cu | ||
| grid_broadcast.cu | ||
| grid_reduction.cu | ||
| grid_sync.cu | ||
| helpers.cu | ||
| index_utils.cu | ||
| random_numbers.cu | ||
| tensor.cu | ||
| warp.cu | ||
| welford.cu | ||