pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jeff Daily	5379b5f927	[ROCm] use hipblas instead of rocblas (#105881 ) - BatchLinearAlgebraLib.cpp is now split into one additional file - BatchLinearAlgebraLib.cpp uses only cusolver APIs - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file - cmake changes to link against hipblas instead of rocblas - hipify mappings changes to map cublas -> hipblas instead of rocblas Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881 Approved by: https://github.com/albanD	2023-07-31 20:42:55 +00:00
Xiaodong Wang	025cd69a86	[AMD] Fix some legacy hipify script (#70594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70594 Pull Request resolved: https://github.com/facebookincubator/gloo/pull/315 Fix some out-dated hipify script: * python -> python3 (fb internal) * rocblas return code * gloo makefile for hip clang Test Plan: Sandcastle + OSS build Reviewed By: malfet, shintaro-iwasaki Differential Revision: D33402839 fbshipit-source-id: 5893039451bcf77bbbb1b88d2e46ae3e39caa154	2022-01-05 11:34:25 -08:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
Andrew Naguib	3cba9e8aaa	Error Message Paraphrasing (#22369 ) Summary: Saying `I` in an err msg is too subjective to be used in a framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369 Differential Revision: D16067712 Pulled By: soumith fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6	2019-06-30 00:13:02 -07:00
Edward Yang	1a9602d5db	Delete caffe2_cuda_full_device_control (#14283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14283 According to Yangqing, this code was only used by us to do some end-to-end performance experiments on the impact of cudaSetDevice and cudaGetDevice. Now that the frameworks are merged, there are a lot of bare calls to those functions which are not covered by this flag. It doesn't seem like a priority to restore this functionality, so I am going to delete it for now. If you want to bring it back, you'll have to make all get/set calls go through this particular interfaces. Reviewed By: dzhulgakov Differential Revision: D13156472 fbshipit-source-id: 4c6d2cc89ab5ae13f7c816f43729b577e1bd985c	2018-11-29 18:33:22 -08:00
Junjie Bai	0d7a986da1	Change hip filename extension to .hip (#14036 ) Summary: xw285cornell - To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc `3d51a1fb01/bin/hipcc (L552)`). - Change to use host compiler to compile .cc\|.cpp files. Previously we use hcc to compile them which is unnecessary - Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036 Reviewed By: xw285cornell Differential Revision: D13091813 Pulled By: bddppq fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0	2018-11-16 11:55:59 -08:00
Xiaodong Wang	e6b6cc06ee	caffe2/core hipify (#13457 ) Summary: Small edits to caffe2/core hipify to make it compile in fbcode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13457 Reviewed By: bddppq Differential Revision: D12883472 Pulled By: xw285cornell fbshipit-source-id: 1da231d721311d105892db13ed726240398ba49e	2018-11-01 15:49:56 -07:00
Junjie Bai	883da952be	Hipify caffe2/core (#13148 ) Summary: petrex ashishfarmer iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148 Reviewed By: xw285cornell Differential Revision: D10862276 Pulled By: bddppq fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4	2018-10-26 15:27:32 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Sergei Nikolaev	1c7832c854	CUDA 10 warnings fixed (#12442 ) Summary: Deprecation warning against `cudaPointerAttributes`, where `memoryType` field has been deprecated in favor of `type`, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#contents-end for details Pull Request resolved: https://github.com/pytorch/pytorch/pull/12442 Differential Revision: D10251239 Pulled By: zou3519 fbshipit-source-id: 500f1e02aa8e11c510475953ef5244d5fb13bf9e	2018-10-11 00:25:22 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	91d76f5dbd	Reapply Windows fix Summary: Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest. Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights: (1) Updated newest protobuf. (2) use protoc dllexport command to ensure proper symbol export for windows. (3) various code updates to make sure that C2 symbols are properly shown (4) cmake file changes to make build proper (5) option to choose static runtime and shared runtime similar to protobuf (6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together. (7) enabled gtest and fixed testing bugs. Earlier PR is #1793 Closes https://github.com/caffe2/caffe2/pull/1827 Differential Revision: D6832086 Pulled By: Yangqing fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df	2018-01-29 10:03:28 -08:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	cf769a7b6f	Avoid race condition in get device properties. Summary: TSIA Reviewed By: salexspb Differential Revision: D5898125 fbshipit-source-id: 1822ef2a017719442045fa446321d007b9d544b8	2017-09-23 16:01:23 -07:00
Yangqing Jia	26f0943130	Do CaffeCudaSetDevice and CaffeCudaGetDevice Summary: These are wrapper functions so that if we run in a Caffe2-only mode, we can turn the flag on and get some small speedup on cuda device switches. The purpose of the diff is to allow us to quickly assess the overhead of cuda device switch functions. Ideally, the caching behavior shall live in the cuda driver, which is the only safe place to ensure correctness. If other code is running aside Caffe2 and does not properly do device guard, this functionality will fail as separate cudaSetDevice() calls will not update Caffe2's thread local device id. As a result, the functionality is only enabled when/if one explicitly sets the flag. This might not be safe, so use with caution. - cudaGetDevice can go from 90ns to 2ns - when setting the same device, we can go from 100ns to 2 ns - when setting a different device, things are the same (1ns overhead on top of 143ns) Reviewed By: azzolini Differential Revision: D5709398 fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e	2017-08-25 18:20:14 -07:00
Yangqing Jia	93e12e75df	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. Summary: Closes https://github.com/caffe2/caffe2/pull/1114 Reviewed By: pietern Differential Revision: D5686557 Pulled By: Yangqing fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770	2017-08-23 18:41:15 -07:00
Simon Layton	85788a0f65	Add TensorCore support Summary: Add support for TensorCore convolution and gemm on Volta hardware. Currently built on top of #1055 Closes https://github.com/caffe2/caffe2/pull/1056 Differential Revision: D5604068 Pulled By: Yangqing fbshipit-source-id: 100f67e26ed5fabb1dbb31dcd77f7ecb84de4ee7	2017-08-10 20:16:48 -07:00
Jeff Johnson	3f860af050	Implement TopKOp for GPU Summary: This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU. There are two algorithm implementations: -for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data. -for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense. Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library. Reviewed By: jamesr66a Differential Revision: D5248206 fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3	2017-06-17 08:47:38 -07:00
Yangqing Jia	81d5461973	cuda check -> enforce Summary: In the past we have moved most of the CHECKs to CAFFE_ENFORCE (exceptions). However, we kept the name "_CHECK" for cuda calls, and that caused some confusion especially in the destructor calls: while our destructors are not written to handle exceptions, these CUDA_CHECKs could actually throw some exceptions. As a result, this diff (1) Renames all cuda related "_CHECK" to "*_ENFORCE" (2) Explicitly marked the destructor of core Caffe2 classes as noexcept (3) Added proper, really-CHECK cuda check macros, and used those in the corresponding destructors. This should not change any of existing functionality. Reviewed By: dzhulgakov Differential Revision: D4656368 fbshipit-source-id: 32e3056e66c0400156c5ca0187b6151cf3d52404	2017-03-05 22:46:22 -08:00
Yangqing Jia	107966b059	add error message for asan Summary: This makes sure that we have useful CUDA error message in asan mode. Also made a fb specific task pass by explicitly marking it not asan-able. Reviewed By: dzhulgakov Differential Revision: D4243471 fbshipit-source-id: 2ce303b97b3b4728c05575a8e7e21eb5960ecbc7	2016-11-29 15:18:39 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	137b880aac	cuda initialization. This makes it callable multiple times but the actual code only runs once. TODO: make it thread safe. I am too lazy for now.	2016-03-15 12:52:05 -07:00
Yangqing Jia	fa59b90c72	misc updates	2016-01-13 21:00:56 -08:00
Yangqing Jia	05eda208a5	Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end.	2016-01-05 09:55:21 -08:00
Yangqing Jia	648d1b101a	A consolidation of a couple random weekend work. (1) various bugfixes. (2) Tensor is now a class independent from its data type. This allows us to write easier type-independent operators. (3) code convention changes a bit: dtype -> T, Tensor<Context> -> Tensor alias. (4) ParallelNet -> DAGNet to be more consistent with what it does. (5) Caffe's own flags library instead of gflags. (6) Caffe's own logging library instead of glog, but glog can be chosen with compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros like CHECK, DCHECK now have prefix CAFFE_, and LOG() now becomes CAFFE_LOG_. (7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF in build_env.py.	2015-10-11 23:14:06 -07:00
Yangqing Jia	5b9584c227	carpet bombing	2015-09-15 21:30:23 -07:00
Yangqing Jia	d72cfcebaf	fixes to allow more consistent build tests	2015-09-06 22:34:22 +00:00
Yangqing Jia	d2ff13d332	put a peer access pattern function to caffe2.	2015-09-06 08:59:04 -07:00
Yangqing Jia	ec069cb3ea	Use a global init function: it seems that with the multiple components optionally linked in, it is best to just enable a registering mechanism for inits.	2015-09-06 08:59:03 -07:00
Yangqing Jia	a12a471b2d	suppress compiler warning.	2015-08-28 14:02:53 -07:00
Yangqing Jia	2ed1077a83	A clean init for Caffe2, removing my earlier hacky commits.	2015-06-25 16:26:01 -07:00

39 Commits