pytorch

OSSForks/pytorch

Fork 0

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Commit Graph

Author	SHA1	Message	Date
Haibo Chen	2e0b114c06	add a new Guage API with an empty backend to PyTorch core (#134883 ) Summary: The current use case is to continuously measure the total allocated and reserved CUDA memory size from CUDACachingAllocator, and export their distribution (min, max, p90 etc) over time as timeseries. The current callback-based API does not work because the backend decides when the measurement is taken, so data points between two measurements may not be recorded. The distribution (e.g. max) as such will not be accurate. This new API closely follow the design of the existing WaitCounter API otherwise. This is not quite a synchronous version of DynamicCounter, as summing multiple data points does not make sense to my use case Test Plan: CI Differential Revision: D61837528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134883 Approved by: https://github.com/c-p-i-o	2024-09-03 17:08:47 +00:00

Author

SHA1

Message

Date

Haibo Chen

2e0b114c06

add a new Guage API with an empty backend to PyTorch core (#134883 )

Summary:
The current use case is to continuously measure the total allocated and reserved CUDA memory size from CUDACachingAllocator, and export their distribution (min, max, p90 etc) over time as timeseries.

The current callback-based API does not work because the backend decides when the measurement is taken, so data points between two measurements may not be recorded. The distribution (e.g. max) as such will not be accurate.

This new API closely follow the design of the existing WaitCounter API otherwise.

This is not quite a synchronous version of DynamicCounter, as summing multiple data points does not make sense to my use case

Test Plan: CI

Differential Revision: D61837528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134883
Approved by: https://github.com/c-p-i-o

2024-09-03 17:08:47 +00:00

1 Commits