pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Dan Johnson	d22c4cc353	Add option to use mempool on OOM (#151487 ) MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool. Usage: Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing. ``` pool = torch.cuda.MemPool(allocator, use_on_oom=True) with torch.cuda.use_mem_pool(pool): a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") del a # at the memory limit, this will succeed by using pool's memory in order to avoid the oom b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") ``` Testing: ``` python test/test_cuda.py -k test_mempool_limited_memory_with_allocator ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487 Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel	2025-04-26 04:04:57 +00:00
Shivam Raikundalia	a11538aa46	[GPU Snapshot] Add Clear History Flag (#149352 ) Summary: Oftentimes, users complain that a bunch of extra events are prepended to their desired GPU snapshot. This is because they usually attach an OOM logger without knowing and when they go to collect the actual snapshot, it adds all the OOM logger contents. Since OOM and regular snapshot use the same backend, we currently don't have the infra in place to split these snapshots. As a solution we add a flag to the snapshot frontend to clear out the history when starting the auto-trace record memory history. A more thorough solution would be to have a user pass in a handle and to have snapshots per handle to seperate the events. However, this would likely be complicated and more work than it is worth as we would have to change the callbacks in the caching allocator and pass these objects between python and cpp. Test Plan: See diff below Differential Revision: D71159720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149352 Approved by: https://github.com/eqy, https://github.com/aaronenyeshi	2025-03-19 21:44:20 +00:00
Marko Radmilac	c65ee728f0	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-03-05 16:13:19 +00:00
PyTorch MergeBot	a983b2b11a	Revert "Initial implementation of host memory stats (#147660 )" This reverts commit `945e359fc1`. Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))	2025-03-01 18:05:45 +00:00
Marko Radmilac	945e359fc1	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-02-28 18:36:44 +00:00
Aaron Orenstein	805c4b597a	PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145202 Approved by: https://github.com/bobrenjc93	2025-01-20 22:37:26 +00:00
Benjamin Glass	4959784dac	Add API query for available per-process CUDA memory (#140620 ) Certain `cpp_wrapper`-enabled tests were OOM-ing in the CI pipeline, with error messages suggesting that sufficient memory was accessible. This ultimately resulted from an internal memory limitation that was not queryable in the API. This PR adds querying for that limit. Additionally, the failing tests had incorrect memory availability checks, and are updated with measured memory requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140620 Approved by: https://github.com/malfet, https://github.com/eqy ghstack dependencies: #141367	2024-12-03 00:24:03 +00:00
Brad Hilton	879e273601	fix: Add type annotation to _record_memory_history (#140545 ) Pylance infers the type of the first argument (`enabled`) to `_record_memory_history` as `str` even though the function accepts `Literal[None, "state", "all"]`. This raises an issue when passing `None`, even though it is a legitimate argument. This PR addresses the issue by adding the type annotation in the doc string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140545 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-11-14 17:44:46 +00:00
Syed Tousif Ahmed	1637a40796	Adds snapshot API for MemPools to get pool memory segments (#133601 ) Canonically, the snapshot API returns the entire memory state of the CUDACachingAllocator (using `get_all_blocks`). There is no API that can only return the memory state of a given pool. In this PR, we extend the functionality of snapshot API such that it can only return the memory addresses of an active pool. When snapshot API is called under a MemPoolContext, we only return the blocks that correspond to the pool id of the active pool. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133601 Approved by: https://github.com/ezyang	2024-10-29 01:01:47 +00:00
PyTorch MergeBot	3b0f39336c	Revert "Adds snapshot API for MemPools to get pool memory segments (#133601 )" This reverts commit `00504aa6b8`. Reverted https://github.com/pytorch/pytorch/pull/133601 on behalf of https://github.com/wdvr due to reverting for now as this breaks lots of internal tests. Details below ([comment](https://github.com/pytorch/pytorch/pull/133601#issuecomment-2441864871))	2024-10-28 15:12:20 +00:00
Syed Tousif Ahmed	00504aa6b8	Adds snapshot API for MemPools to get pool memory segments (#133601 ) Canonically, the snapshot API returns the entire memory state of the CUDACachingAllocator (using `get_all_blocks`). There is no API that can only return the memory state of a given pool. In this PR, we extend the functionality of snapshot API such that it can only return the memory addresses of an active pool. When snapshot API is called under a MemPoolContext, we only return the blocks that correspond to the pool id of the active pool. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133601 Approved by: https://github.com/ezyang	2024-10-26 03:34:59 +00:00
Syed Tousif Ahmed	03c72976a5	Properly uses ref-counting for torch.cuda.use_mem_pool (#133600 ) This PR refactors some ref-counting functionality out of `beginAllocateToPool` and `releasePool`. The ref-counting logic is then used in construction and destruction of `torch.cuda.MemPool`. The `use_count` variable in the CUDACachingAllocator is essentially a refcount of how many context managers are using the pool. Since we are now lifting up the MemPool abstraction to the user, the MemPool object itself now needs to hold a an extra reference as well. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133600 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-10-22 03:21:53 +00:00
Jeff Daily	c7b0d4b148	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-04 15:36:29 +00:00
PyTorch MergeBot	0d1701f310	Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 )" This reverts commit `7001907480`. Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))	2024-10-03 06:22:55 +00:00
Jeff Daily	7001907480	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-02 16:27:15 +00:00
Syed Tousif Ahmed	4655eb3ee2	Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685 ) Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685 Approved by: https://github.com/ezyang	2024-08-29 03:56:31 +00:00
Tobias Ringwald	6753ee127c	Allow torch.cuda.memory.mem_get_info to take a device str argument with an unspecified device index. (#132616 ) `torch.cuda.memory.mem_get_info` allows device strings given the current type hints. However, `device = torch.device('cuda')` leads to `device.index = None`, which results in downstream problems. Setting `optional=True` will insert the default device index in such cases. Fixes #132583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132616 Approved by: https://github.com/soulitzer	2024-08-06 13:19:46 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Syed Tousif Ahmed	7c89ec0f7c	Implements torch.cuda.MemPool() API (#131152 ) In this PR: - Pool id creation logic is refactored and moved to a MemPool class. `graph_pool_handle()` API now uses `torch.cuda.MemPool()` to get a unique id for a pool. Existing tests should cover this change. - MemPool holds a pointer to a CUDAAllocator as proposed in https://github.com/pytorch/pytorch/issues/124807#issuecomment-2077506997. Tests are added to show usage with CUDAPluggableAllocator. - MemPoolContext API makes a mempool active. Tests are added to show usage of this API. This API will be used in CUDACachingAllocator to route allocations to a user provided allocator. See draft here: https://github.com/pytorch/pytorch/pull/125722/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/131152 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-08-01 01:29:30 +00:00
Xiaodong Wang	9e753d1f20	[AMD] catch exception when other processes belong to other users (#131018 ) Summary: It is a long known pain point that if other users are running things, the call of `torch.cuda.memory.list_gpu_processes()` will error out: ``` torch.cuda.memory.list_gpu_processes() File "torch/cuda/memory.py", line 647, in list_gpu_processes procs = amdsmi.amdsmi_get_gpu_process_list(handle) # type: ignore[attr-defined] File "amdsmi/py_interface/amdsmi_interface.py", line 1946, in amdsmi_get_gpu_process_list _check_res( File "amdsmi/py_interface/amdsmi_interface.py", line 510, in _check_res raise AmdSmiLibraryException(ret_code) amdsmi.py_interface.amdsmi_exception.AmdSmiLibraryException: Error code: 10 \| AMDSMI_STATUS_NO_PERM - Permission Denied ``` So just catch this error Test Plan: torch.cuda.memory.list_gpu_processes() no longer fails Differential Revision: D59901053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131018 Approved by: https://github.com/eqy, https://github.com/clee2000	2024-07-22 19:38:51 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Xiaodong Wang	06934518a2	[AMD] Fix deprecated amdsmi api (#126962 ) Summary: https://github.com/pytorch/pytorch/pull/119182 uses an API that has already been deprecated by `c551c3caed`. So fixing this in a backward compatible way Differential Revision: D57711088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126962 Approved by: https://github.com/eqy, https://github.com/izaitsevfb	2024-05-26 20:11:23 +00:00
Jack Taylor	d30cdc4321	[ROCm] amdsmi library integration (#119182 ) Adds monitoring support for ROCm using amdsmi in place of pynvml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/xw285cornell	2024-05-21 01:59:26 +00:00
PyTorch MergeBot	0d4fdb0bb7	Revert "[ROCm] amdsmi library integration (#119182 )" This reverts commit `85447c41e3`. Reverted https://github.com/pytorch/pytorch/pull/119182 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the ROCm failed test is legit `85447c41e3` ([comment](https://github.com/pytorch/pytorch/pull/119182#issuecomment-2103433197))	2024-05-09 21:18:21 +00:00
Jack Taylor	85447c41e3	[ROCm] amdsmi library integration (#119182 ) Adds monitoring support for ROCm using amdsmi in place of pynvml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/xw285cornell	2024-05-09 18:21:38 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
Zitong Zeng	c65aa5af6e	[Pytorch] doc sync-stream-and-free-HBM counter in memory_stats (#123799 ) Differential Revision: D56000503 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123799 Approved by: https://github.com/malfet	2024-04-12 21:19:45 +00:00
PyTorch MergeBot	a2a4693c1b	Revert "Init CUDA instead of faking memory stats (#121698 )" This reverts commit `2460f0b1c7`. Reverted https://github.com/pytorch/pytorch/pull/121698 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it breaks inductor CPU tests `5b90074540` ([comment](https://github.com/pytorch/pytorch/pull/121698#issuecomment-1995868090))	2024-03-13 21:23:42 +00:00
Simon Fan	2460f0b1c7	Init CUDA instead of faking memory stats (#121698 ) This is very confusing when checking for memory usage and allocations are only happening using C API. We should change it to a warning/error or just init cuda. Codepaths that run on non-CUDA environments shouldn't call into these functions in the first place Pull Request resolved: https://github.com/pytorch/pytorch/pull/121698 Approved by: https://github.com/jansel	2024-03-13 19:31:44 +00:00
Yu, Guangye	46e3f670b4	refactor code to share across different devices (#120602 ) # Motivation Refactor utils code to make it possible to share across CUDA, XPU, and other backends. # Solution Move `_dummy_type` and `_LazySeedTracker` to torch._utils; # Additional Context When upstreaming, refactor these code changes by isolating them into in an additional PR to minimize their impact on the CUDA code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120602 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang	2024-02-28 09:42:58 +00:00
Kazuaki Ishizaki	3e2c9410e1	Fix docstring errors in memory.py, nvtx.py (#112751 ) Fixes #112590 Fixed docstring errors in `torch/cuda/memory.py` and `torch/cuda/nvtx.py`. memory.py Before ``` torch/cuda/memory.py:1 at module level: D100: Missing docstring in public module torch/cuda/memory.py:67 in public function `caching_allocator_alloc`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') torch/cuda/memory.py:103 in public function `caching_allocator_delete`: D401: First line should be in imperative mood (perhaps 'Delete', not 'Deletes') torch/cuda/memory.py:122 in public function `set_per_process_memory_fraction`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:148 in public function `empty_cache`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:148 in public function `empty_cache`: D400: First line should end with a period (not 'g') torch/cuda/memory.py:163 in public function `memory_stats`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:163 in public function `memory_stats`: D400: First line should end with a period (not 'a') torch/cuda/memory.py:163 in public function `memory_stats`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:264 in public function `memory_stats_as_nested_dict`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:272 in public function `reset_accumulated_memory_stats`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:292 in public function `reset_peak_memory_stats`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D400: First line should end with a period (not 'e') torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:365 in public function `memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:365 in public function `memory_allocated`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:365 in public function `memory_allocated`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:383 in public function `max_memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:383 in public function `max_memory_allocated`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:383 in public function `max_memory_allocated`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:405 in public function `memory_reserved`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:405 in public function `memory_reserved`: D400: First line should end with a period (not 's') torch/cuda/memory.py:405 in public function `memory_reserved`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:421 in public function `max_memory_reserved`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:421 in public function `max_memory_reserved`: D400: First line should end with a period (not 's') torch/cuda/memory.py:421 in public function `max_memory_reserved`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:443 in public function `memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:452 in public function `max_memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:461 in public function `memory_snapshot`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:474 in public function `memory_summary`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:474 in public function `memory_summary`: D400: First line should end with a period (not 'r') torch/cuda/memory.py:474 in public function `memory_summary`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:612 in public function `list_gpu_processes`: D202: No blank lines allowed after function docstring (found 1) torch/cuda/memory.py:612 in public function `list_gpu_processes`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:612 in public function `list_gpu_processes`: D400: First line should end with a period (not 's') torch/cuda/memory.py:612 in public function `list_gpu_processes`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:648 in public function `mem_get_info`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:648 in public function `mem_get_info`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:648 in public function `mem_get_info`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:684 in private function `_record_memory_history`: D202: No blank lines allowed after function docstring (found 1) torch/cuda/memory.py:684 in private function `_record_memory_history`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:684 in private function `_record_memory_history`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:684 in private function `_record_memory_history`: D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables') torch/cuda/memory.py:742 in private function `_snapshot`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:742 in private function `_snapshot`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') torch/cuda/memory.py:818 in private function `_dump_snapshot`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:818 in private function `_dump_snapshot`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') torch/cuda/memory.py:849 in public function `get_allocator_backend`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:849 in public function `get_allocator_backend`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:849 in public function `get_allocator_backend`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:894 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/memory.py:904 in public function `change_current_allocator`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:904 in public function `change_current_allocator`: D401: First line should be in imperative mood (perhaps 'Change', not 'Changes') torch/cuda/memory.py:917 in private function `_get_current_allocator`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 58 ``` After ``` torch/cuda/memory.py:151 in public function `empty_cache`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:151 in public function `empty_cache`: D400: First line should end with a period (not 'g') torch/cuda/memory.py:439 in public function `memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:448 in public function `max_memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:676 in private function `_record_memory_history`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:676 in private function `_record_memory_history`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:841 in public function `get_allocator_backend`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:841 in public function `get_allocator_backend`: D400: First line should end with a period (not 'y') 8 ``` nvtx.py Before ``` torch/cuda/nvtx.py:1 at module level: D100: Missing docstring in public module torch/cuda/nvtx.py:24 in public function `range_push`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:24 in public function `range_push`: D400: First line should end with a period (not 'd') torch/cuda/nvtx.py:35 in public function `range_pop`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:35 in public function `range_pop`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:43 in public function `range_start`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:43 in public function `range_start`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:81 in public function `range`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:81 in public function `range`: D400: First line should end with a period (not 'g') 9 ``` After ``` torch/cuda/nvtx.py:41 in public function `range_start`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:41 in public function `range_start`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:79 in public function `range`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:79 in public function `range`: D400: First line should end with a period (not 'g') 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112751 Approved by: https://github.com/kit1980	2023-11-03 15:19:17 +00:00
Zachary DeVito	7fb131043c	[memory snapshots] _record_memory_history_legacy bug fix (#108260 ) The argment order for the legacy path got swapped in a recent patch. Because there is still a blog post documenting the legacy interface people are hitting this pathway. This patch fixes #108208 I will also update the blog post to the new API so that people are more likely to use the newer `_record_memory_history` API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108260 Approved by: https://github.com/awgu	2023-08-30 22:33:04 +00:00
Zachary DeVito	40cbda274b	document memory snapshotting (#107660 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107660 Approved by: https://github.com/albanD ghstack dependencies: #107171, #107399	2023-08-24 19:20:03 +00:00
Zachary DeVito	80988b6277	Introduce memory stacks for free (#106758 ) Previously when we recorded a free action in a memory trace, we would provide the stack for when the block was allocated. This is faster because we do not have to record stacks for free, which would otherwise double the number of stacks collected. However, sometimes knowing the location of a free is useful for figuring out why a tensor was live. So this PR adds this behavior. If performance ends up being a concern the old behavior is possible by passing "alloc" to the context argument rather than "all". Also refactors some of glue logic to be consistent across C++ and Python and routes the Python API through the C++ version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106758 Approved by: https://github.com/albanD	2023-08-14 20:38:15 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
Zachary DeVito	b1a83c4da4	[memory history] cleanup recording API (#97406 ) This makes the options for recording memory history easier to understand and makes the default to record the most information. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 4706acf</samp> This pull request enhances the memory profiling and debugging capabilities of PyTorch on CUDA devices. It introduces a new API for memory history recording in `torch/cuda/memory.py` and `test/test_cuda.py`, and adds new functions for memory snapshot management and visualization in `torch/cuda/memory.py`. Also adds a quick _dump_snapshot function to make it easier to look at the common visualizations. <!-- copilot:walkthrough --> ### <samp>🤖 Generated by Copilot at 4706acf</samp> * Modify the `_record_memory_history` function to use a new API that accepts a string argument for the `enabled` parameter and more parameters to control the stack trace collection and memory event history ([link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-80bd98caafb20d758f45a4d23711810f7e0b9ce7a6505094f9dbb0e00a657377L620-R696)) * Add a new function `_dump_snapshot` that allows users to dump a memory snapshot to a directory with HTML plots of the memory segments and events ([link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-80bd98caafb20d758f45a4d23711810f7e0b9ce7a6505094f9dbb0e00a657377R703-R713)) * Update the test cases in `test/test_cuda.py` to use the new API for memory history recording and check the expected output of the memory plots ([link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L4946-R4946), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L4984-R4984), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5000-R5000), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5015-R5015), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5035-R5038), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R5045-R5046), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5060-R5059), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5068-R5065), [link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450L5088-R5085)) * Add missing imports and types to the `torch/cuda/memory.py` module ([link](https://github.com/pytorch/pytorch/pull/97406/files?diff=unified&w=0#diff-80bd98caafb20d758f45a4d23711810f7e0b9ce7a6505094f9dbb0e00a657377L5-R15)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97406 Approved by: https://github.com/ezyang	2023-03-28 16:31:10 +00:00
loganthomas	c848a777e8	DOC: Various typo fixes (#97095 ) Various typos found while browsing documentation/source code. Thank you for a wonderful deep-learning library! Pull Request resolved: https://github.com/pytorch/pytorch/pull/97095 Approved by: https://github.com/mikaylagawarecki, https://github.com/kit1980	2023-03-20 20:46:04 +00:00
Stas Bekman	11e708dd6b	[doc] fix `torch.cuda.mem_get_info` doc (#96621 ) the current `torch.cuda.mem_get_info` doc is incorrect. This util returns `free, total` and not `free, used` ``` __host__ cudaError_t cudaMemGetInfo ( size_t* free, size_t* total ) Gets free and total device memory. ``` Also this util isn't mentioned in https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management - should it be included there as well? Pull Request resolved: https://github.com/pytorch/pytorch/pull/96621 Approved by: https://github.com/kit1980	2023-03-15 18:11:00 +00:00
Zachary DeVito	4b372e3958	[memory profiling] C++ tracing support (#95357 ) Adds the ability to quickly generate stack traces for C++, and combine Python, TorchScript, and C++ frames into a single trace. This makes it possible for the memory tracer to record allocations inside C++ code (e.g. convolution temporaries, backward operators). The unwinder code is ~10x faster than execinfo.h's backward because it cache fast unwinder routines for instruction pointers that have already been seen. It is also only 1.2--2x slower than copying the entire stack (the approach perf takes), while using 2 orders of magnitude less space per stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95357 Approved by: https://github.com/bertmaher	2023-03-12 07:24:14 +00:00
Johan Nordberg	dc4f2af6f6	Take `CUDA_VISIBLE_DEVICES` into account for nvml calls (#94568 ) Fixes #94472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94568 Approved by: https://github.com/ngimel	2023-02-15 17:50:12 +00:00
c-odrin	54b7c7d5e9	Added requested_bytes to CUDA Caching Allocator Stats (#88575 ) Summary: The caching allocator can be configured to round memory allocations in order to reduce fragmentation. Sometimes however, the overhead from rounding can be higher than the fragmentation it helps reduce. We have added a new stat to CUDA caching allocator stats to help track if rounding is adding too much overhead and help tune the roundup_power2_divisions flag: - "requested_bytes.{current,peak,allocated,freed}": memory requested by client code, compare this with allocated_bytes to check if allocation rounding adds too much overhead Test Plan: Added test case in caffe2/test/test_cuda.py Differential Revision: D40810674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88575 Approved by: https://github.com/zdevito	2023-02-09 21:37:25 +00:00
milesial	421f40e051	Use binary units for CUDA memory summary (#91854 ) To reduce confusion, use for example `KiB` instead of `KB` since we're talking powers of 2 and not 10. https://en.wikipedia.org/wiki/Byte#Multiple-byte_units ``` import torch x = torch.zeros(1024 * 1024, dtype=torch.uint8, device='cuda') print(torch.cuda.memory_summary()) ``` ``` \|===========================================================================\| \| PyTorch CUDA memory summary, device ID 0 \| \|---------------------------------------------------------------------------\| \| CUDA OOMs: 0 \| cudaMalloc retries: 0 \| \|===========================================================================\| \| Metric \| Cur Usage \| Peak Usage \| Tot Alloc \| Tot Freed \| \|---------------------------------------------------------------------------\| \| Allocated memory \| 1024 KiB \| 1024 KiB \| 1024 KiB \| 0 B \| \| from large pool \| 0 KiB \| 0 KiB \| 0 KiB \| 0 B \| \| from small pool \| 1024 KiB \| 1024 KiB \| 1024 KiB \| 0 B \| \|---------------------------------------------------------------------------\| \| Active memory \| 1024 KiB \| 1024 KiB \| 1024 KiB \| 0 B \| \| from large pool \| 0 KiB \| 0 KiB \| 0 KiB \| 0 B \| \| from small pool \| 1024 KiB \| 1024 KiB \| 1024 KiB \| 0 B \| \|---------------------------------------------------------------------------\| ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91854 Approved by: https://github.com/ngimel	2023-01-14 05:10:51 +00:00
Emilio Castillo	c9d4390d13	Add Pluggable CUDA allocator backend (#86786 ) Fixes #43144 This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries. For example, we could have the following allocator in c++ ```c++ #include <sys/types.h> #include <cuda_runtime_api.h> #include <iostream> extern "C" { void* my_malloc(ssize_t size, int device, cudaStream_t stream) { void ptr; std::cout<<"alloc "<< size<<std::endl; cudaMalloc(&ptr, size); return ptr; } void my_free(void ptr) { std::cout<<"free "<<std::endl; cudaFree(ptr); } } ``` Compile it as a shared library ``` nvcc allocator.cc -o alloc.so -shared --compiler-options '-fPIC' ``` And use it from PyTorch as follows ```python import torch # Init caching # b = torch.zeros(10, device='cuda') new_alloc = torch.cuda.memory.CUDAPluggableAllocator('alloc.so', 'my_malloc', 'my_free') old = torch.cuda.memory.get_current_allocator() torch.cuda.memory.change_current_allocator(new_alloc) b = torch.zeros(10, device='cuda') # This will error since the current allocator was already instantiated torch.cuda.memory.change_current_allocator(old) ``` Things to discuss - How to test this, needs compiling external code ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/86786 Approved by: https://github.com/albanD	2022-11-23 17:54:36 +00:00
Kazuaki Ishizaki	1cd6ebe095	Fix typos in messages under torch (#89049 ) This PR fixes typos of messages in `.py` files under torch directory. Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049 Approved by: https://github.com/lezcano	2022-11-17 04:18:14 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00

1 2

72 Commits