pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Eddie Yan	bac33ea8b6	[CUDA] Drop CUDA 10 support (#89582 ) CC @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/89582 Approved by: https://github.com/malfet, https://github.com/ngimel	2023-01-05 05:11:53 +00:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Jane Xu	533cb9530e	Introducing TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API to the code (#50627 ) Summary: Sub-step of my attempt to split up the torch_cuda library, as it is huge. Please look at https://github.com/pytorch/pytorch/issues/49050 for details on the split and which files are in which target. This PR introduces two new macros for Windows DLL purposes, TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API. Both are defined as TORCH_CUDA_API for the time being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50627 Reviewed By: mruberry Differential Revision: D25955441 Pulled By: janeyx99 fbshipit-source-id: ff226026833b8fb2fb7c77df6f2d6c824f006869	2021-01-21 19:09:11 -08:00
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Natalia Gimelshein	9c19a12965	fix asserts in cuda code (#39047 ) Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719	2020-05-28 15:51:38 -07:00
Natalia Gimelshein	ba14a701dc	restore proper cuda assert behavior with DNDEBUG (#38943 ) Summary: Per title. https://github.com/pytorch/pytorch/issues/32719 essentially disabled asserts in cuda kernels in release build. Asserts in cuda kernels are typically used to prevent invalid reads/writes, so without asserts invalid read/writes are silent errors in most cases (sometimes they would still cause "illegal memory access" errors, but because of caching allocator this usually won't happen). We don't need 2 macros, CUDA_ALWAYS_ASSERT and CUDA_KERNEL_ASSERT because all current asserts in cuda kernels are important to prevent illegal memory accesses, and they should never be disabled. This PR removes macro CUDA_ALWAYS_ASSERT and instead makes CUDA_KERNEL_ASSERT (that is commonly used in the kernels) an asserttion both in release and debug builds. Fixes https://github.com/pytorch/pytorch/issues/38771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38943 Differential Revision: D21723767 Pulled By: ngimel fbshipit-source-id: d88d8aa1b047b476d5340e69311e65aff4da5074	2020-05-26 18:11:00 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Junjie Bai	352731bd6e	Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip} Test Plan: revert-hammer Differential Revision: D18632773 Original commit changeset: ea717c81e0d7 fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d	2019-11-21 15:01:09 -08:00
Edward Yang	ec30d9028a	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. Some subtleties about the patch: - There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file. - DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases. - torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) - The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 - In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l ibprotobuf.a(arena.cc.o) is referenced by DSO" - A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions. - There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this. - Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases. Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18632773 Pulled By: ezyang fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82	2019-11-21 11:27:33 -08:00
Karl Ostmo	49481d576d	Torch rename (#20774 ) Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821	2019-06-12 20:12:34 -07:00
Edward Yang	1e6acc676f	Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623 Despite it's generic sounding name, caffe2::DeviceGuard actually only worked on CUDA devices. Rename it to something that more clearly spells out its applicability. I'm not sure if it's the right call, but in this patch I added 'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more in-line with how the Caffe2 codebase is currently written. More idiomatic c10 namespace style would be to say cuda::CUDAGuard. Willing to change this if people shout. This is a respin of D13156470 (#14284) Reviewed By: dzhulgakov Differential Revision: D14285504 fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d	2019-03-06 10:48:15 -08:00
Dmytro Dzhulgakov	96ea2594d8	Don't call cudaStreamDestroy at destruction time (#15692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692 It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed. Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose. Reviewed By: Yangqing Differential Revision: D13571988 fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46	2019-01-11 12:36:41 -08:00
hbraun@nvidia.com	3fdf567752	Adding CUDA version for C2 operators generate proposals and nms (#13694 ) Summary: Related to issue #13684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13694 Reviewed By: wat3rBro Differential Revision: D13017791 Pulled By: newstzpz fbshipit-source-id: 4bdc58e474d8e1f6cd73a02bf51f91542a2b9d0b	2018-12-20 14:39:09 -08:00
rohithkrn	7e2b074219	Integrate rocBLAS fp16 api into Caffe2 (#14882 ) Summary: This PR integrates rocBLAS half and mixed precision APIs in to Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882 Differential Revision: D13407840 Pulled By: bddppq fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e	2018-12-10 17:54:06 -08:00
Junjie Bai	0d7a986da1	Change hip filename extension to .hip (#14036 ) Summary: xw285cornell - To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc `3d51a1fb01/bin/hipcc (L552)`). - Change to use host compiler to compile .cc\|.cpp files. Previously we use hcc to compile them which is unnecessary - Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036 Reviewed By: xw285cornell Differential Revision: D13091813 Pulled By: bddppq fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0	2018-11-16 11:55:59 -08:00
Edward Yang	fed8d8975a	Various improvements to hipify_python.py (#13973 ) Summary: - Speed up hipify_python.py by blacklisting useless (and quite large) directory trees that it would otherwise recurse into - Pass around relative paths instead of absolute paths. This makes it easier to do filename matches based on the root of the tree. - Redo the streaming output to contain more useful information - Make it handle c10/cuda correctly, rewrite c10::cuda to c10::hip, and the header name from CUDAMathCompat.h to CUDAHIPCompat.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13973 Differential Revision: D13062374 Pulled By: ezyang fbshipit-source-id: f0858dd18c94d449ff5dbadc22534c695dc0f8fb	2018-11-14 17:11:24 -08:00
Junjie Bai	95ca66763d	Add math functions overloaded over different numeric types for cuda and hip (#13602 ) Summary: petrex ashishfarmer rohithkrn iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602 Reviewed By: dzhulgakov Differential Revision: D12935797 Pulled By: bddppq fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235	2018-11-06 01:40:31 -08:00
Xiaodong Wang	e6b6cc06ee	caffe2/core hipify (#13457 ) Summary: Small edits to caffe2/core hipify to make it compile in fbcode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13457 Reviewed By: bddppq Differential Revision: D12883472 Pulled By: xw285cornell fbshipit-source-id: 1da231d721311d105892db13ed726240398ba49e	2018-11-01 15:49:56 -07:00
Xiaomeng Yang	017b91f861	Optimize channel_shuffle_op on GPU (#13066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13066 Optimize channel_shuffle_op on GPU Reviewed By: houseroad Differential Revision: D10639281 fbshipit-source-id: 394b937403e5d4e9df93548bbf87285bffaa55a9	2018-10-30 12:18:27 -07:00
Sergei Nikolaev	1c7832c854	CUDA 10 warnings fixed (#12442 ) Summary: Deprecation warning against `cudaPointerAttributes`, where `memoryType` field has been deprecated in favor of `type`, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#contents-end for details Pull Request resolved: https://github.com/pytorch/pytorch/pull/12442 Differential Revision: D10251239 Pulled By: zou3519 fbshipit-source-id: 500f1e02aa8e11c510475953ef5244d5fb13bf9e	2018-10-11 00:25:22 -07:00
Mingzhe Li	aa8cd7319a	Enable build_test on windows (#11802 ) Summary: This PR enables BUILD_TEST for Caffe2 on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11802 Reviewed By: orionr Differential Revision: D9951223 Pulled By: mingzhe09088 fbshipit-source-id: 7cdc1626b999daadeae482bd569eebdbd53eb6d4	2018-09-19 20:17:49 -07:00
Mingzhe Li	a7cbcb1bb9	Enable build_python on windows (#11385 ) Summary: The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385 Reviewed By: orionr Differential Revision: D9884906 Pulled By: mingzhe09088 fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6	2018-09-17 21:40:03 -07:00
Yangqing Jia	68613cf5a2	Windows DLL build with Caffe2 code (#11266 ) Summary: This is an experimental build on top of what orionr and mingzhe09088 built. Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266 Reviewed By: orionr Differential Revision: D9682942 Pulled By: Yangqing fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3	2018-09-06 15:12:20 -07:00
Lukasz Wesolowski	e2ecf3914a	Change default CUDA block size from 512 to 128 (#10090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090 Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000) Reviewed By: pjh5 Differential Revision: D9093573 fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd	2018-08-02 15:40:13 -07:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00
Xiaomeng Yang	a61d4a3374	[Caffe2] Refactor reduce ops to take flexible input types (#7164 ) * Refactor reduce ops to take flexible input types * Add DISPATCH_FUNCTION macros in common_gpu.h * Use macros to reduce switch case in dispatching cuda functions	2018-05-02 12:08:38 -07:00
Xiaomeng Yang	2ef23b6241	[caffe2] Update transpose with compile time dimension (#6614 ) * Update transpose with compile time dimension * Change return to break	2018-04-15 19:20:39 -07:00
Xiaomeng Yang	cd2112717c	[caffe2] Update math functions with params on host. (#6602 ) * Update ReduceMean Add reduce mean to math Add reduce mean to math * sync reduce_ops_test * Update math_gpu.cu	2018-04-14 21:41:41 -07:00
Xianjie Chen	6ed9a0c3f2	fix cuda elementwise ops for empty batch CUDA will fail to launch empty kernel	2018-03-27 18:10:39 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	ced2c7e2b2	Remove Set/GetDefaultGPUID and move to use current gpu id instead. Summary: Reason for this change: (1) Setting/Getting default gpu id doesn't seem to be used at all. (2) It actually is confusing compared to the CUDA_VISIBLE_DEVICES options etc. (3) When setting cuda_gpu_id=-1 in the CUDAContext arg, it used to use the default gpu id but probably we should use the current gpu - so that the caller will be able to control the device placement. One use case is for TensorRT - if we have a custom callback layer, then it would be easier for TRT or whatever caller to set the running device. Reviewed By: dzhulgakov Differential Revision: D6740357 fbshipit-source-id: 2ea710e434b10220d5a198e31c93847304636863	2018-01-19 18:03:21 -08:00
Xianjie Chen	d1c73eb407	use size_t for rand fill functions in math Summary: The number of elements in the caffe2 blob can be larger than int32. Use size_t to prevent overflow. Reviewed By: ajtulloch Differential Revision: D6278363 fbshipit-source-id: 356e294c667a53360d8a65b56a63a39d5ce3384e	2017-11-09 18:44:46 -08:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	cf769a7b6f	Avoid race condition in get device properties. Summary: TSIA Reviewed By: salexspb Differential Revision: D5898125 fbshipit-source-id: 1822ef2a017719442045fa446321d007b9d544b8	2017-09-23 16:01:23 -07:00
Yangqing Jia	26f0943130	Do CaffeCudaSetDevice and CaffeCudaGetDevice Summary: These are wrapper functions so that if we run in a Caffe2-only mode, we can turn the flag on and get some small speedup on cuda device switches. The purpose of the diff is to allow us to quickly assess the overhead of cuda device switch functions. Ideally, the caching behavior shall live in the cuda driver, which is the only safe place to ensure correctness. If other code is running aside Caffe2 and does not properly do device guard, this functionality will fail as separate cudaSetDevice() calls will not update Caffe2's thread local device id. As a result, the functionality is only enabled when/if one explicitly sets the flag. This might not be safe, so use with caution. - cudaGetDevice can go from 90ns to 2ns - when setting the same device, we can go from 100ns to 2 ns - when setting a different device, things are the same (1ns overhead on top of 143ns) Reviewed By: azzolini Differential Revision: D5709398 fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e	2017-08-25 18:20:14 -07:00
Yangqing Jia	93e12e75df	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. Summary: Closes https://github.com/caffe2/caffe2/pull/1114 Reviewed By: pietern Differential Revision: D5686557 Pulled By: Yangqing fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770	2017-08-23 18:41:15 -07:00
Yangqing Jia	5d24a4eeef	Early design for a general Event abstraction cross-devices. Summary: There are ad-hoc efforts on avoiding excessive device synchronizations, such as async_dag, singlethread_async, etc. This diff aims to provide an early design for a general Event class, that can achieve the following: (1) It is device agnostic, essentially using a vtable to do cross device record, wait and synchronization. (2) Created new functions WaitEvent and Record in the Context class for interacting with Events. (3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase class as well. An example use case is that, after potential future refactoring, one can achieve a real async execution per operator by running op.WaitEvent(previous_event); op.RunAsync(); op.RecordEvent(this_op_event); and the next op can do next_op.WaitEvent(this_op_event); Right now, I changed async_dag net implementation so that it uses the general event design. The old Event class is assimilated to the general Event class and the old Stream class is now essentially taken over by the Context class itself. Reviewed By: harouwu Differential Revision: D5648463 fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc	2017-08-18 15:46:51 -07:00
Pieter Noordhuis	692f4e4e3b	Disable -Wstrict-aliasing when including cuda_fp16.h Summary: The cuda_fp16.h header in CUDA 9 RC triggers this diagnostic. It is included by cusparse.h as well, so guarding the inclusion of only cuda_fp16.h is not enough. Reviewed By: Yangqing Differential Revision: D5651995 fbshipit-source-id: 4778a8a793761e7a1dbebf3792b85b33a3e26219	2017-08-17 14:15:32 -07:00
Simon Layton	85788a0f65	Add TensorCore support Summary: Add support for TensorCore convolution and gemm on Volta hardware. Currently built on top of #1055 Closes https://github.com/caffe2/caffe2/pull/1056 Differential Revision: D5604068 Pulled By: Yangqing fbshipit-source-id: 100f67e26ed5fabb1dbb31dcd77f7ecb84de4ee7	2017-08-10 20:16:48 -07:00
Yangqing Jia	475eff5281	Allow peer access only in groups of 8 Summary: This is the hardware limit set by NVidia. Basically, on Amazon P2 machines that have 16 gpus, the previous setting will trigger an error. This fixes the issue but is pending verification from Amazon. Differential Revision: D4888402 fbshipit-source-id: 8d26a24d6e0476f895b9afdb979144eb8e6b9321	2017-04-14 09:47:48 -07:00
Yangqing Jia	81d5461973	cuda check -> enforce Summary: In the past we have moved most of the CHECKs to CAFFE_ENFORCE (exceptions). However, we kept the name "_CHECK" for cuda calls, and that caused some confusion especially in the destructor calls: while our destructors are not written to handle exceptions, these CUDA_CHECKs could actually throw some exceptions. As a result, this diff (1) Renames all cuda related "_CHECK" to "*_ENFORCE" (2) Explicitly marked the destructor of core Caffe2 classes as noexcept (3) Added proper, really-CHECK cuda check macros, and used those in the corresponding destructors. This should not change any of existing functionality. Reviewed By: dzhulgakov Differential Revision: D4656368 fbshipit-source-id: 32e3056e66c0400156c5ca0187b6151cf3d52404	2017-03-05 22:46:22 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	137b880aac	cuda initialization. This makes it callable multiple times but the actual code only runs once. TODO: make it thread safe. I am too lazy for now.	2016-03-15 12:52:05 -07:00
Yangqing Jia	78aa266770	Fix	2016-01-19 14:49:48 -08:00
Yangqing Jia	d84545c5fb	fp16: allow one to override.	2016-01-19 14:39:26 -08:00
Yangqing Jia	fa59b90c72	misc updates	2016-01-13 21:00:56 -08:00

1 2

60 Commits