pytorch/torch/csrc/cuda
Marko Radmilac c65ee728f0 Initial implementation of host memory stats (#147660)
This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics.

This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache.

As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later.

Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660
Approved by: https://github.com/ngimel
2025-03-05 16:13:19 +00:00
..
shared [codemod] Fix unused-value issue in caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp +4 (#147555) 2025-03-01 19:46:13 +00:00
comm.cpp Turn static inline into static function (#139843) 2024-11-07 23:58:18 +00:00
comm.h
CUDAPluggableAllocator.cpp [19/N] Fix extra warnings brought by clang-tidy-17 (#144448) 2025-01-09 15:58:05 +00:00
CUDAPluggableAllocator.h [19/N] Fix extra warnings brought by clang-tidy-17 (#144448) 2025-01-09 15:58:05 +00:00
device_set.h
Event.cpp
Event.h
GdsFile.cpp Add and use thread-safe strerror (#140472) 2024-11-19 04:24:17 +00:00
GdsFile.h
Graph.cpp Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)" 2025-02-13 18:04:26 +00:00
memory_snapshot.cpp [2/N] Remove unnecessary once flag usage (#145057) 2025-01-23 09:48:46 +00:00
memory_snapshot.h
MemPool.cpp
Module.cpp Initial implementation of host memory stats (#147660) 2025-03-05 16:13:19 +00:00
Module.h
nccl.cpp Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211)" 2025-02-03 22:04:28 +00:00
nccl.h
python_comm.cpp
python_comm.h
python_nccl.cpp Fix minor typo in python_nccl (#148088) 2025-02-28 00:47:09 +00:00
python_nccl.h
Stream.cpp
Stream.h
Tensor.cpp
THCP.h
utils.cpp [BE] Add missing throw of std::runtime_error in scrc/cuda/utils.cpp (#144962) 2025-01-16 17:35:39 +00:00