Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135
Separate affine_channel from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13727606
fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
Summary:
This PR contains changes for:
1. Adding HIP top_k operator in Caffe2
2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils
3. Removing the top_k operator test from ROCm test ignore list
4. Bug fixes in related code in THC/THCAsmUtils.cuh
Differential Revision: D12986451
Pulled By: bddppq
fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504
Summary:
Adds support for the CUDA 9 toolkit.
Includes new fp16 data type fixes, and changes to warp-synchronous programming. Also updates CUB third-party repo for CUDA 9 support.
Closes https://github.com/caffe2/caffe2/pull/853
Differential Revision: D5548507
Pulled By: Yangqing
fbshipit-source-id: c7fd2edb623f2aa8c67b9a1000efc8f71e6832ab
Summary:
This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU.
There are two algorithm implementations:
-for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data.
-for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense.
Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library.
Reviewed By: jamesr66a
Differential Revision: D5248206
fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3