pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	ce94b212c7	[Environment Variable][Rebase] Use thread-safe getenv functions (#140200 ) Use our thread-safe getenv wrappers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140200 Approved by: https://github.com/kwen2501, https://github.com/eqy	2025-05-02 00:41:49 +00:00
Ethan Wee	6cbf97ede8	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/izaitsevfb Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-03-19 23:42:35 +00:00
PyTorch MergeBot	e1d143cb7b	Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145 )" This reverts commit `ee1a2b7810`. Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/izaitsevfb due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2738115728))	2025-03-19 21:12:13 +00:00
Ethan Wee	ee1a2b7810	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-03-19 03:59:55 +00:00
PyTorch MergeBot	9d37b501db	Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145 )" This reverts commit `2e02c07a5d`. Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @albanD, might you be able to help get this PR landed? See D71214814 for more details on the failure. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2730104736))	2025-03-17 16:17:02 +00:00
Ethan Wee	2e02c07a5d	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily	2025-03-14 18:21:27 +00:00
cyy	203dd18c5c	Bump Clang-tidy to 19.1.4 (#148648 ) Because Clang-tidy 19 has more powerful clang-analyzer checks to detect subtle bugs. New checks such as misc-use-internal-linkage can help identify potential static variables or functions, thus reducing binary sizes. Some new checks are disabled temporarily for later enabling. Additional warnings have been fixed or suppressed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148648 Approved by: https://github.com/Skylion007	2025-03-10 17:32:30 +00:00
cyy	dca443835e	Enable more readability-redundant checks (#143963 ) They are helpful to simplifying code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143963 Approved by: https://github.com/albanD	2024-12-30 14:49:33 +00:00
Banit Agrawal	a575ce0dc6	[PyTorch Pinned Allocator] Add support of background thread to process events (#135524 ) Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list. Differential Revision: D62396585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524 Approved by: https://github.com/zyan0	2024-09-17 21:08:10 +00:00
Banit Agrawal	48d18fbd4c	[PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174 ) Summary: This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments. For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc. Differential Revision: D62758758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174 Approved by: https://github.com/zyan0	2024-09-17 19:08:44 +00:00
Shiyan Deng	db0b74bbc5	[CUDA Caching Allocator] Allow division of 0 (#126833 ) Summary: Division of 0 means disabling roundup. Test Plan: CI Differential Revision: D57651410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126833 Approved by: https://github.com/banitag1	2024-05-22 17:40:39 +00:00
PyTorch MergeBot	277ab8a4c0	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `a56e057814`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/jeanschmidt due to Broken internal signals, @albanD please help get this sorted :) ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2069716129))	2024-04-22 14:44:44 +00:00
cyy	a56e057814	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/malfet, https://github.com/albanD	2024-04-19 13:39:41 +00:00
PyTorch MergeBot	61bc188f42	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `b51f66c195`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/malfet due to Broke gcc9 builds ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2064936414))	2024-04-18 18:53:59 +00:00
cyy	b51f66c195	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/albanD	2024-04-18 13:35:48 +00:00
PyTorch MergeBot	f5049de242	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `5bef127c2e`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/PaliC due to your using TORCH_INTERNAL_ASSERT incorrectly ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2062696010))	2024-04-17 23:44:00 +00:00
cyy	5bef127c2e	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/albanD	2024-04-16 04:39:20 +00:00
cyy	fb10e13000	[Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} (#120781 ) This PR begins to clean clang-tidy warnings of code in c10/cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120781 Approved by: https://github.com/ezyang	2024-03-15 05:03:22 +00:00
Aaron Enye Shi	7973ac586d	[Memory Snapshot] Add CUDAAllocatorConfig details into snapshot metadata (#119404 ) Summary: Include the CUDAAllocatorConfig at the time of snapshot into the snapshot file. These include adding variables: ``` double garbage_collection_threshold; size_t max_split_size; size_t pinned_num_register_threads; bool expandable_segments; bool release_lock_on_cudamalloc; bool pinned_use_cuda_host_register; std::string last_allocator_settings; std::vector<size_t> roundup_power2_divisions; ``` Test Plan: `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True ` produces ``` {'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True', 'max_split_size': -1, 'garbage_collection_threshold': 0.0, 'expandable_segments': True, 'pinned_num_register_threads': 1, 'release_lock_on_cudamalloc': False, 'pinned_use_cuda_host_register': False, 'roundup_power2_divisions': {'1': 0, '2': 0, '4': 0, '8': 0, '16': 0, '32': 0, '64': 0, '128': 0, '256': 0, '512': 0, '1024': 0, '2048': 0, '4096': 0, '8192': 0, '16384': 0, '32768': 0}} ``` `PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]"` produces ``` {'PYTORCH_CUDA_ALLOC_CONF': 'max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]', 'max_split_size': 2097152000, 'garbage_collection_threshold': 0.0, 'expandable_segments': False, 'pinned_num_register_threads': 1, 'release_lock_on_cudamalloc': False, 'pinned_use_cuda_host_register': False, 'roundup_power2_divisions': {'1': 1, '2': 1, '4': 1, '8': 1, '16': 1, '32': 1, '64': 1, '128': 1, '256': 1, '512': 2, '1024': 8, '2048': 8, '4096': 8, '8192': 8, '16384': 8, '32768': 8} } ``` Differential Revision: D53536199 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/119404 Approved by: https://github.com/zdevito	2024-02-17 01:16:37 +00:00
cyy	4a019047ad	Enable nested namespace check in clang-tidy (#118506 ) It is time to enable nested namespaces in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118506 Approved by: https://github.com/albanD	2024-01-31 00:32:35 +00:00
cyy	b72ddbab60	[Clang-tidy header][15/N] Enable clang-tidy on headers in c10/cuda and c10/mobile (#116602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116602 Approved by: https://github.com/ezyang	2024-01-18 08:15:50 +00:00
Jeff Daily	59592ce9f2	[CUDA Host Allocator][ROCm] fixes (#110715 ) Follow up to #110123, removing the CUDA_VERSION check for ROCm because HIP already has hipMallocAsync() and doesn't need the version check there. Follow up to #108488, fixing the unit failing unit tests by accepting either a "cuda" or "hip" attribute for the caching allocator options. This is aligned to the masquerading strategy for ROCm/HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110715 Approved by: https://github.com/ezyang	2023-10-06 21:42:24 +00:00
Banit Agrawal	64583c4d04	[CUDA Host Allocator] Add support of CudaHostRegister (#108488 ) Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister. Differential Revision: D45843715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488 Approved by: https://github.com/zdevito	2023-10-06 04:13:02 +00:00
Banit Agrawal	30c4c6ff9b	[PyTorch CCA] Refactor caching allocator config code (#110123 ) Summary: This diff refactors the code by moving CUDAAllocatorConfig into the header file. This config refactoring is done so that we can use the same config code for CUDA pinned memory as well. Test Plan: sandcastle Differential Revision: D49653265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110123 Approved by: https://github.com/zdevito	2023-10-04 14:58:23 +00:00

24 Commits