pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Aaron Gokaslan	18d7b8e4f7	[BE]: ruff apply rule PLW1510 to find silent subprocess errors (#113644 ) Reopens #111682 that I messed up due to a bad rebase and triggered some issues with CLA. This explicitly adds check=True or False to any subprocess calls where appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113644 Approved by: https://github.com/ezyang, https://github.com/kit1980	2023-11-14 20:59:40 +00:00
zabboud	1d9919c46d	Fix pydocstyle for issue 112591 (#113233 ) Fixes #112591 Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and methods within each module (see details below). pydocstyle torch/cuda/_memory_viz.py --count before: 7 after: 4 remaining errors: ``` torch/cuda/_memory_viz.py:77 in public function `format_flamegraph`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:121 in public function `segments`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:128 in public function `memory`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:135 in public function `compare`: D103: Missing docstring in public function ``` pydocstyle torch/cuda/streams.py --count before: 29 after: 8 remaining errors: ``` torch/cuda/streams.py:1 at module level: D100: Missing docstring in public module torch/cuda/streams.py:31 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:105 in public method `__eq__`: D105: Missing docstring in magic method torch/cuda/streams.py:110 in public method `__hash__`: D105: Missing docstring in magic method torch/cuda/streams.py:113 in public method `__repr__`: D105: Missing docstring in magic method torch/cuda/streams.py:135 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:163 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:237 in public method `__repr__`: D105: Missing docstring in magic method ``` pydocstyle torch/cuda/__init__.py --count before: 100 after: 46 remaining errors: ``` torch/cuda/__init__.py:251 in public class `DeferredCudaCallError`: D101: Missing docstring in public class torch/cuda/__init__.py:327 in public function `cudart`: D103: Missing docstring in public function torch/cuda/__init__.py:332 in public class `cudaStatus`: D101: Missing docstring in public class torch/cuda/__init__.py:337 in public class `CudaError`: D101: Missing docstring in public class torch/cuda/__init__.py:338 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:343 in public function `check_error`: D103: Missing docstring in public function torch/cuda/__init__.py:369 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:373 in public method `__enter__`: D105: Missing docstring in magic method torch/cuda/__init__.py:376 in public method `__exit__`: D105: Missing docstring in magic method torch/cuda/__init__.py:391 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:473 in public class `StreamContext`: D204: 1 blank line required after class docstring (found 0) torch/cuda/__init__.py:485 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:499 in public method `__enter__`: D105: Missing docstring in magic method torch/cuda/__init__.py:514 in public method `__exit__`: D105: Missing docstring in magic method torch/cuda/__init__.py:541 in public function `set_stream`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:838 in public function `current_blas_handle`: D400: First line should end with a period (not 'e') torch/cuda/__init__.py:894 in public function `memory_usage`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:894 in public function `memory_usage`: D400: First line should end with a period (not ')') torch/cuda/__init__.py:913 in public function `utilization`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:913 in public function `utilization`: D400: First line should end with a period (not 'r') torch/cuda/__init__.py:949 in public function `power_draw`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:949 in public function `power_draw`: D400: First line should end with a period (not ')') torch/cuda/__init__.py:1089 in public class `ByteStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1091 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1100 in public class `DoubleStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1102 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1111 in public class `FloatStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1113 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1122 in public class `HalfStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1124 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1133 in public class `LongStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1135 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1144 in public class `IntStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1146 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1155 in public class `ShortStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1157 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1166 in public class `CharStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1168 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1177 in public class `BoolStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1179 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1188 in public class `BFloat16Storage`: D101: Missing docstring in public class torch/cuda/__init__.py:1190 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1199 in public class `ComplexDoubleStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1201 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1210 in public class `ComplexFloatStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1212 in public method `dtype`: D102: Missing docstring in public method ``` @mikaylagawarecki @albanD @svekars @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/113233 Approved by: https://github.com/malfet	2023-11-13 16:24:53 +00:00
Zachary DeVito	6f07c57416	MemoryViz.js: format, move style (#106482 ) This updates the JS format of MemoryViz.js to match internal format. It also moves the style sheet into the JS so it is easier package for both oss and internal use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106482 Approved by: https://github.com/aaronenyeshi ghstack dependencies: #106328	2023-08-03 00:42:13 +00:00
Zachary DeVito	45b564766d	[memory snapshots] removed chained history (#106079 ) For free blocks of memory in the allocator, we previously kept a linked list of the stack frames of previous allocations that lived there. This was only ever used in one flamegraph visualization and never proved useful at understanding what was going on. When memory history tracing was added, it became redundant, since we can see the history of the free space from recording the previous actions anyway. This patch removes this functionality and simplifies the snapshot format: allocated blocks directly have a 'frames' attribute rather than burying stack frames in the history. Previously the memory history tracked the real size of allocations before rounding. Since history was added, 'requested_size' has been added directly to the block which records the same information, so this patch also removes that redundancy. None of this functionality has been part of a PyTorch release with BC guarentees, so it should be safe to alter this part of the format. This patch also updates our visualization tools to work with the simplified format. Visualization tools keep support for the old format in `_legacy` functions so that during the transition old snapshot files can still be read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106079 Approved by: https://github.com/eellison	2023-07-28 06:45:48 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	3c5a494d7a	Revert "Update mypy to 1.4.1 (#91983 )" This reverts commit `634659e262`. Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))	2023-07-14 15:59:16 +00:00
Nikita Shulga	634659e262	Update mypy to 1.4.1 (#91983 ) Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi	2023-07-13 16:30:36 +00:00
Zachary DeVito	ae78e80123	[memory_viz] fix javascript url (#103741 ) It turns out that jsdelivr, which is used to access the MemoryViz.js source from generated files, doesn't work unless a version is specified. This wasn't able to be tested until the PR actually landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103741 Approved by: https://github.com/aaronenyeshi	2023-06-16 13:15:45 +00:00
Zachary DeVito	19b3e07fe0	[memory_viz] Unified viewer (#103565 ) This replaces the invidual visualization routines in _memory_viz.py with a single javascript application. The javascript application can load pickled snapshot dumps directly using drag/drop, requesting them via fetch, or by embedding them in a webpage. The _memory_viz.py commands use the embedding approach. We can also host MemoryViz.js on a webpage to use the drag/drop approach, e.g. https://zdevito.github.io/assets/viz/ (eventually this should be hosted with the pytorch docs). All views/multiple cuda devices are supported on one page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103565 Approved by: https://github.com/eellison, https://github.com/albanD	2023-06-16 03:49:48 +00:00
Zachary DeVito	346feb6b56	[memory_viz] profile_plot generates snapshot objects (#103497 ) This will make it easier to use a single html viewer for both ways of generating the data. The next PR will change MemoryPlot.js to simply read the snapshot information directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103497 Approved by: https://github.com/eellison	2023-06-16 03:49:48 +00:00
Zachary DeVito	efc3bcceb1	Move memory viz templates into separate javascript files (#103474 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103474 Approved by: https://github.com/eellison	2023-06-16 03:49:46 +00:00
Zachary DeVito	0ca3c6f7d7	[_memory_viz.py] Fix bug when using profile_plot (#103384 ) When we updated plotting to add level of detail the Legend code for profile_plot got broken. This patch fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103384 Approved by: https://github.com/drisspg	2023-06-14 16:54:29 +00:00
Zachary DeVito	7ff1f3f3f6	Revert "Revert "Expandable blocks in allocator (#96995 )"" (#99275 ) This reverts commit `851e89c8e8`. Differential Revision: [D45034526](https://our.internmc.facebook.com/intern/diff/D45034526) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99275 Approved by: https://github.com/eellison	2023-04-17 23:46:08 +00:00
PyTorch MergeBot	851e89c8e8	Revert "Expandable blocks in allocator (#96995 )" This reverts commit `6a50b83b73`. Reverted https://github.com/pytorch/pytorch/pull/96995 on behalf of https://github.com/izaitsevfb due to Breaks internal tests	2023-04-16 19:23:37 +00:00
Zachary DeVito	6a50b83b73	Expandable blocks in allocator (#96995 ) Common advice we give for handling memory fragmentation issues is to allocate a big block upfront to reserve memory which will get split up later. For programs with changing tensor sizes this can be especially helpful to avoid OOMs that happen the first time we see a new largest input and would otherwise have to allocate new segments. However the issue with allocating a block upfront is that is nearly impossible to correctly estimate the size of that block. If too small, space in the block will run out and the allocator will allocate separate blocks anyway. Too large, and other non-PyTorch libraries might stop working because they cannot allocate any memory. This patch provides the same benefits as using a pre-allocating block but without having to choose its size upfront. Using the cuMemMap-style APIs, it adds the ability to expand the last block in a segment when more memory is needed. Compared to universally using cudaMallocAsync to avoid fragmentation, this patch can fix this common fragmentation issue while preserving most of the existing allocator behavior. This behavior can be enabled and disabled dynamically. This should allow users to, for instance, allocate long-lived parameters and state in individual buffers, and put temporary state into the large expandable blocks, further reducing fragmentation. See inline comments for information about the implementation and its limitations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995 Approved by: https://github.com/eellison	2023-04-14 09:49:11 +00:00
Zachary DeVito	e37986d48f	[memory viz] support larger visualizations (#98865 ) When there are > 15000 polygons trace_plot starts to get really slow. So order the allocations and take the smallest allocations beyond the 15000 limit and put them into a single summarized polygon. A slider allows this limit to be adjusted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98865 Approved by: https://github.com/yf225	2023-04-11 23:56:41 +00:00
Zachary DeVito	1c83888be8	[memory profiling] show pre-existing memory in trace_plot (#97590 ) Previously we only plotted memory if it was allocated or freed while trace recording was active. This change also adds any pre-existing blocks to the visualization. This helps because it is common to enable trace recording later and then not realize that there is a lot of allocated memory in the trace eventhough a lot was allocated beforehad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97590 Approved by: https://github.com/eellison	2023-03-28 16:31:10 +00:00
loganthomas	c848a777e8	DOC: Various typo fixes (#97095 ) Various typos found while browsing documentation/source code. Thank you for a wonderful deep-learning library! Pull Request resolved: https://github.com/pytorch/pytorch/pull/97095 Approved by: https://github.com/mikaylagawarecki, https://github.com/kit1980	2023-03-20 20:46:04 +00:00
Zachary DeVito	e74f70d212	Revert "Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 )"" (#96878 ) This reverts commit `e1ea584b1c`. Adds __has_include check to fix fbcode build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96878 Approved by: https://github.com/ezyang	2023-03-16 04:12:54 +00:00
PyTorch MergeBot	e1ea584b1c	Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 )" This reverts commit `4e1060c609`. Reverted https://github.com/pytorch/pytorch/pull/95541 on behalf of https://github.com/DanilBaibak due to breaking internal builds	2023-03-15 13:28:41 +00:00
Zachary DeVito	4e1060c609	[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 ) This refactors the stack trace facility specific to memory profiling in python+cuda to make a generic facility to generate combined stack traces. The generic facility (combined_traceback.h) does not require python to be around to work, but will return python stacks if it is present. This facility is then used to add support for stack trace gathering in memory profiling that happens directly from C++. It is also used to expose a python API for gathering and symbolizing combineds stacks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95541 Approved by: https://github.com/ezyang	2023-03-14 18:26:05 +00:00
Zachary DeVito	4b372e3958	[memory profiling] C++ tracing support (#95357 ) Adds the ability to quickly generate stack traces for C++, and combine Python, TorchScript, and C++ frames into a single trace. This makes it possible for the memory tracer to record allocations inside C++ code (e.g. convolution temporaries, backward operators). The unwinder code is ~10x faster than execinfo.h's backward because it cache fast unwinder routines for instruction pointers that have already been seen. It is also only 1.2--2x slower than copying the entire stack (the approach perf takes), while using 2 orders of magnitude less space per stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95357 Approved by: https://github.com/bertmaher	2023-03-12 07:24:14 +00:00
Zachary DeVito	d6d8d3484e	_memory_viz.py: Visualize how blocks fit into segments. (#91336 ) Add a segment_plot command that visualizes how blocks are allocated into segments. This is similar to the 'stats' command but produces an interactive html viewer rather than text dump, allowing exploration of stack traces. It also adds the ability to see the layout at any point in the trace by starting from the snapshot and then apply the events backwards to reconstruct what memory would have looked like. Example: ![Screen Shot 2022-12-22 at 3 32 49 PM](https://user-images.githubusercontent.com/370202/209242650-b952372e-37ac-400a-a01c-13be2b5426fa.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91336 Approved by: https://github.com/bhosmer	2023-03-07 21:07:18 +00:00
Zachary DeVito	71f369092d	Revert "Revert "memory viz: Add colors for categories and a legend (#90587 )"" (#96133 ) This reverts commit `b38b39c441`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96133 Approved by: https://github.com/bhosmer	2023-03-07 21:07:18 +00:00
Eli Uriegas	b38b39c441	Revert "memory viz: Add colors for categories and a legend (#90587 )" This reverts commit `ee43842505`.	2023-03-06 11:38:58 -08:00
Zachary DeVito	ee43842505	memory viz: Add colors for categories and a legend (#90587 ) Adds a category legend to memory trace plots that colors allocations by their role (activation, parameter, gradient, etc.) as captured by kineto. Differential Revision: [D43757381](https://our.internmc.facebook.com/intern/diff/D43757381) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90587 Approved by: https://github.com/aaronenyeshi	2023-03-03 20:42:22 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Zachary DeVito	bf2668a899	Add support for kineto in memory viz (#90567 ) This is just rudimentary initial support that does the same stuff as the trace profile. Follow will add category encodings to the tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90567 Approved by: https://github.com/robieta	2022-12-13 21:31:16 +00:00
Zachary DeVito	3b3ed25109	Add a way to visualize memory snapshot traces (#90348 ) This adds a d3-based interactive visualization for exploring the memory allocation traces that the caching allocator can capture. This visualization code can also be attached to kineto trace information in the future to also provide visualization for the memory events captured there, which come with addition information about the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90348 Approved by: https://github.com/robieta	2022-12-10 02:45:11 +00:00
Zachary DeVito	91b1bae1df	Caching allocator tracing (#86241 ) We currently can take snapshots of the state of the allocated cuda memory, but we do not have a way to correlate these snapshots with the actions the allocator that were taken between snapshots. This PR adds a simple fixed-sized buffer that records the major actions that the allocator takes (ALLOC, FREE, SEGMENT_ALLOC, SEGMENT_FREE, OOM, SNAPSHOT) and includes these with the snapshot information. Capturing period snapshots with a big enough trace buffer makes it possible to see how the allocator state changes over time. We plan to use this functionality to guide how settings in the allocator can be adjusted and eventually have a more robust overall algorithm. As a component of this functionality, we also add the ability to get a callback when the allocator will throw an OOM, primarily so that snapshots can be taken immediately to see why the program ran out of memory (most programs have some C++ state that would free tensors before the OutOfMemory exception can be caught). This PR also updates the _memory_viz.py script to pretty-print the trace information and provide a better textual summary of snapshots distinguishing between internal and external fragmentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86241 Approved by: https://github.com/ngimel	2022-10-07 23:19:54 +00:00
Zachary DeVito	726d040692	annotated allocator snapshots (#82146 ) Record stack trace information for each allocated segment in the allocator. It takes around 1.5us to record 50 stack frames of context. Since invoking a Pytorch operator is around 8us, this adds minimal overhead but we still leave it disabled by default so that we can test it more on real workloads first. Stack information is kept both for allocated blocks and the last allocation used inactive blocks. We could potential keep around the _first_ allocation that caused the block to get allocated from cuda as well. Potential Followups: * stack frame entries are small (16 bytes), but the list of Frames is not compressed eventhough most frames will share some entries. So far this doesn't produce huge dumps (7MB for one real workload that uses all memory on the GPU), but it can be much smaller through compression. * Code to format the information is slow (a few seconds) because it uses python and FlameGraph.pl * Things allocated during the backward pass have no stack frames because they are run on another C++ thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82146 Approved by: https://github.com/albanD	2022-08-09 17:21:35 +00:00

35 Commits