Commit Graph

6 Commits

Author SHA1 Message Date
Xiang Gao
b3fac8af6b Initial support for building on Ampere GPU, CUDA 11, cuDNN 8 (#39277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277

This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.

Test Plan: Imported from OSS

Differential Revision: D21832814

Pulled By: malfet

fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
2020-06-02 10:03:42 -07:00
Igor Sugak
108fc78395 [caffe2] fix invalid % escape in inline assembly strings (#33554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554

NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool.

1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow

Reviewed By: bddppq

Differential Revision: D20003621

fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc
2020-02-20 14:31:52 -08:00
Ashish
5ae3b44255 Added HIP top_k operator (#13747)
Summary:
This PR contains changes for:
1. Adding HIP top_k operator in Caffe2
2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils
3. Removing the top_k operator test from ROCm test ignore list
4. Bug fixes in related code in THC/THCAsmUtils.cuh

Differential Revision: D12986451

Pulled By: bddppq

fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504
2018-11-08 20:14:53 -08:00
Syed Tousif Ahmed
ffbac7d0bb Miscellaneous updates for CUDA 10 (#12017)
Summary:
This PR has some updates related to CUDA 10.

- c2195e9864 ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888
- 423d8d3524 follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017

Differential Revision: D10013405

Pulled By: soumith

fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8
2018-09-24 11:58:32 -07:00
Simon Layton
e97c04118e CUDA 9 support
Summary:
Adds support for the CUDA 9 toolkit.

Includes new fp16 data type fixes, and changes to warp-synchronous programming. Also updates CUB third-party repo for CUDA 9 support.
Closes https://github.com/caffe2/caffe2/pull/853

Differential Revision: D5548507

Pulled By: Yangqing

fbshipit-source-id: c7fd2edb623f2aa8c67b9a1000efc8f71e6832ab
2017-08-06 11:50:17 -07:00
Jeff Johnson
3f860af050 Implement TopKOp for GPU
Summary:
This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU.

There are two algorithm implementations:

-for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data.
-for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense.

Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library.

Reviewed By: jamesr66a

Differential Revision: D5248206

fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3
2017-06-17 08:47:38 -07:00