pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiang Gao	b3fac8af6b	Initial support for building on Ampere GPU, CUDA 11, cuDNN 8 (#39277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277 This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8. TF32 related features will not be included in this PR. Test Plan: Imported from OSS Differential Revision: D21832814 Pulled By: malfet fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06	2020-06-02 10:03:42 -07:00
Igor Sugak	108fc78395	[caffe2] fix invalid % escape in inline assembly strings (#33554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554 NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool. 1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow Reviewed By: bddppq Differential Revision: D20003621 fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc	2020-02-20 14:31:52 -08:00
Ashish	5ae3b44255	Added HIP top_k operator (#13747 ) Summary: This PR contains changes for: 1. Adding HIP top_k operator in Caffe2 2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils 3. Removing the top_k operator test from ROCm test ignore list 4. Bug fixes in related code in THC/THCAsmUtils.cuh Differential Revision: D12986451 Pulled By: bddppq fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504	2018-11-08 20:14:53 -08:00
Syed Tousif Ahmed	ffbac7d0bb	Miscellaneous updates for CUDA 10 (#12017 ) Summary: This PR has some updates related to CUDA 10. - `c2195e9864` ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888 - `423d8d3524` follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017 Differential Revision: D10013405 Pulled By: soumith fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8	2018-09-24 11:58:32 -07:00
Simon Layton	e97c04118e	CUDA 9 support Summary: Adds support for the CUDA 9 toolkit. Includes new fp16 data type fixes, and changes to warp-synchronous programming. Also updates CUB third-party repo for CUDA 9 support. Closes https://github.com/caffe2/caffe2/pull/853 Differential Revision: D5548507 Pulled By: Yangqing fbshipit-source-id: c7fd2edb623f2aa8c67b9a1000efc8f71e6832ab	2017-08-06 11:50:17 -07:00
Jeff Johnson	3f860af050	Implement TopKOp for GPU Summary: This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU. There are two algorithm implementations: -for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data. -for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense. Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library. Reviewed By: jamesr66a Differential Revision: D5248206 fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3	2017-06-17 08:47:38 -07:00

6 Commits