Commit Graph

35 Commits

Author SHA1 Message Date
Xuehai Pan
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
Aaron Gokaslan
18d7b8e4f7 [BE]: ruff apply rule PLW1510 to find silent subprocess errors (#113644)
Reopens #111682 that I messed up due to a bad rebase and triggered some issues with CLA. This explicitly adds check=True or False to any subprocess calls where appropriate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113644
Approved by: https://github.com/ezyang, https://github.com/kit1980
2023-11-14 20:59:40 +00:00
zabboud
1d9919c46d Fix pydocstyle for issue 112591 (#113233)
Fixes #112591

Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and methods within each module (see details below).

pydocstyle torch/cuda/_memory_viz.py --count
before: 7
after: 4

**remaining errors:**
```
torch/cuda/_memory_viz.py:77 in public function `format_flamegraph`:
        D103: Missing docstring in public function
torch/cuda/_memory_viz.py:121 in public function `segments`:
        D103: Missing docstring in public function
torch/cuda/_memory_viz.py:128 in public function `memory`:
        D103: Missing docstring in public function
torch/cuda/_memory_viz.py:135 in public function `compare`:
        D103: Missing docstring in public function
```

pydocstyle torch/cuda/streams.py --count
before: 29
after: 8

**remaining errors:**
```
torch/cuda/streams.py:1 at module level:
        D100: Missing docstring in public module
torch/cuda/streams.py:31 in public method `__new__`:
        D102: Missing docstring in public method
torch/cuda/streams.py:105 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/cuda/streams.py:110 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/cuda/streams.py:113 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/cuda/streams.py:135 in public method `__new__`:
        D102: Missing docstring in public method
torch/cuda/streams.py:163 in public method `__new__`:
        D102: Missing docstring in public method
torch/cuda/streams.py:237 in public method `__repr__`:
        D105: Missing docstring in magic method
```

pydocstyle torch/cuda/__init__.py --count
before: 100
after: 46

**remaining errors:**
```
torch/cuda/__init__.py:251 in public class `DeferredCudaCallError`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:327 in public function `cudart`:
        D103: Missing docstring in public function
torch/cuda/__init__.py:332 in public class `cudaStatus`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:337 in public class `CudaError`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:338 in public method `__init__`:
        D107: Missing docstring in __init__
torch/cuda/__init__.py:343 in public function `check_error`:
        D103: Missing docstring in public function
torch/cuda/__init__.py:369 in public method `__init__`:
        D107: Missing docstring in __init__
torch/cuda/__init__.py:373 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/cuda/__init__.py:376 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/cuda/__init__.py:391 in public method `__init__`:
        D107: Missing docstring in __init__
torch/cuda/__init__.py:473 in public class `StreamContext`:
        D204: 1 blank line required after class docstring (found 0)
torch/cuda/__init__.py:485 in public method `__init__`:
        D107: Missing docstring in __init__
torch/cuda/__init__.py:499 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/cuda/__init__.py:514 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/cuda/__init__.py:541 in public function `set_stream`:
        D205: 1 blank line required between summary line and description (found 0)
torch/cuda/__init__.py:838 in public function `current_blas_handle`:
        D400: First line should end with a period (not 'e')
torch/cuda/__init__.py:894 in public function `memory_usage`:
        D205: 1 blank line required between summary line and description (found 0)
torch/cuda/__init__.py:894 in public function `memory_usage`:
        D400: First line should end with a period (not ')')
torch/cuda/__init__.py:913 in public function `utilization`:
        D205: 1 blank line required between summary line and description (found 0)
torch/cuda/__init__.py:913 in public function `utilization`:
        D400: First line should end with a period (not 'r')
torch/cuda/__init__.py:949 in public function `power_draw`:
        D205: 1 blank line required between summary line and description (found 0)
torch/cuda/__init__.py:949 in public function `power_draw`:
        D400: First line should end with a period (not ')')
torch/cuda/__init__.py:1089 in public class `ByteStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1091 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1100 in public class `DoubleStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1102 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1111 in public class `FloatStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1113 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1122 in public class `HalfStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1124 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1133 in public class `LongStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1135 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1144 in public class `IntStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1146 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1155 in public class `ShortStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1157 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1166 in public class `CharStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1168 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1177 in public class `BoolStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1179 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1188 in public class `BFloat16Storage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1190 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1199 in public class `ComplexDoubleStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1201 in public method `dtype`:
        D102: Missing docstring in public method
torch/cuda/__init__.py:1210 in public class `ComplexFloatStorage`:
        D101: Missing docstring in public class
torch/cuda/__init__.py:1212 in public method `dtype`:
        D102: Missing docstring in public method
```

@mikaylagawarecki @albanD @svekars @jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113233
Approved by: https://github.com/malfet
2023-11-13 16:24:53 +00:00
Zachary DeVito
6f07c57416 MemoryViz.js: format, move style (#106482)
This updates the JS format of MemoryViz.js to match internal format.
It also moves the style sheet into the JS so it is easier package for
both oss and internal use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106482
Approved by: https://github.com/aaronenyeshi
ghstack dependencies: #106328
2023-08-03 00:42:13 +00:00
Zachary DeVito
45b564766d [memory snapshots] removed chained history (#106079)
For free blocks of memory in the allocator, we previously kept a linked list
of the stack frames of previous allocations that lived there. This was only
ever used in one flamegraph visualization and never proved useful at
understanding what was going on. When memory history tracing was added, it
became redundant, since we can see the history of the free space from recording
the previous actions anyway.

This patch removes this functionality and simplifies the snapshot format:
allocated blocks directly have a 'frames' attribute rather than burying stack frames in the history.
Previously the memory history tracked the real size of allocations before rounding.
Since history was added, 'requested_size' has been added directly to the block which records the same information,
so this patch also removes that redundancy.

None of this functionality has been part of a PyTorch release with BC guarentees, so it should be safe to alter
this part of the format.

This patch also updates our visualization tools to work with the simplified format. Visualization tools keep
support for the old format in `_legacy` functions so that during the transition old snapshot files can still be read.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106079
Approved by: https://github.com/eellison
2023-07-28 06:45:48 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
3c5a494d7a Revert "Update mypy to 1.4.1 (#91983)"
This reverts commit 634659e262.

Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))
2023-07-14 15:59:16 +00:00
Nikita Shulga
634659e262 Update mypy to 1.4.1 (#91983)
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  -
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi
2023-07-13 16:30:36 +00:00
Zachary DeVito
ae78e80123 [memory_viz] fix javascript url (#103741)
It turns out that jsdelivr, which is used to access the MemoryViz.js
source from generated files, doesn't work unless a version is specified.

This wasn't able to be tested until the PR actually landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103741
Approved by: https://github.com/aaronenyeshi
2023-06-16 13:15:45 +00:00
Zachary DeVito
19b3e07fe0 [memory_viz] Unified viewer (#103565)
This replaces the invidual visualization routines in _memory_viz.py with
a single javascript application.

The javascript application can load pickled snapshot dumps directly using
drag/drop, requesting them via fetch, or by embedding them in a webpage.

The _memory_viz.py commands use the embedding approach.
We can also host MemoryViz.js on a webpage to use the drag/drop approach, e.g.
https://zdevito.github.io/assets/viz/
(eventually this should be hosted with the pytorch docs).

All views/multiple cuda devices are supported on one page.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103565
Approved by: https://github.com/eellison, https://github.com/albanD
2023-06-16 03:49:48 +00:00
Zachary DeVito
346feb6b56 [memory_viz] profile_plot generates snapshot objects (#103497)
This will make it easier to use a single html viewer for
both ways of generating the data. The next PR will change MemoryPlot.js
to simply read the snapshot information directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103497
Approved by: https://github.com/eellison
2023-06-16 03:49:48 +00:00
Zachary DeVito
efc3bcceb1 Move memory viz templates into separate javascript files (#103474)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103474
Approved by: https://github.com/eellison
2023-06-16 03:49:46 +00:00
Zachary DeVito
0ca3c6f7d7 [_memory_viz.py] Fix bug when using profile_plot (#103384)
When we updated plotting to add level of detail the Legend
code for profile_plot got broken. This patch fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103384
Approved by: https://github.com/drisspg
2023-06-14 16:54:29 +00:00
Zachary DeVito
7ff1f3f3f6 Revert "Revert "Expandable blocks in allocator (#96995)"" (#99275)
This reverts commit 851e89c8e8.

Differential Revision: [D45034526](https://our.internmc.facebook.com/intern/diff/D45034526)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99275
Approved by: https://github.com/eellison
2023-04-17 23:46:08 +00:00
PyTorch MergeBot
851e89c8e8 Revert "Expandable blocks in allocator (#96995)"
This reverts commit 6a50b83b73.

Reverted https://github.com/pytorch/pytorch/pull/96995 on behalf of https://github.com/izaitsevfb due to Breaks internal tests
2023-04-16 19:23:37 +00:00
Zachary DeVito
6a50b83b73 Expandable blocks in allocator (#96995)
Common advice we give for handling memory fragmentation issues is to
allocate a big block upfront to reserve memory which will get split up later.
For programs with changing tensor sizes this can be especially helpful to
avoid OOMs that happen the first time we see a new largest input and would
otherwise have to allocate new segments.

However the issue with allocating a block upfront is that is nearly impossible
to correctly estimate the size of that block. If too small, space in the block
will run out and the allocator will allocate separate blocks anyway. Too large,
and other non-PyTorch libraries might stop working because they cannot allocate
any memory.

This patch provides the same benefits as using a pre-allocating block but
without having to choose its size upfront. Using the cuMemMap-style APIs,
it adds the ability to expand the last block in a segment when more memory is
needed.

Compared to universally using cudaMallocAsync to avoid fragmentation,
this patch can fix this common fragmentation issue while preserving most
of the existing allocator behavior. This behavior can be enabled and disabled dynamically.
 This should allow users to, for instance, allocate long-lived parameters and state in individual buffers,
and put temporary state into the large expandable blocks, further reducing
fragmentation.

See inline comments for information about the implementation and its limitations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995
Approved by: https://github.com/eellison
2023-04-14 09:49:11 +00:00
Zachary DeVito
e37986d48f [memory viz] support larger visualizations (#98865)
When there are > 15000 polygons trace_plot starts to get really slow.
So order the allocations and take the smallest allocations beyond the 15000
limit and put them into a single summarized polygon.
A slider allows this limit to be adjusted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98865
Approved by: https://github.com/yf225
2023-04-11 23:56:41 +00:00
Zachary DeVito
1c83888be8 [memory profiling] show pre-existing memory in trace_plot (#97590)
Previously we only plotted memory if it was allocated or freed while
trace recording was active. This change also adds any pre-existing blocks
to the visualization. This helps because it is common to enable trace recording
later and then not realize that there is a lot of allocated memory in
the trace eventhough a lot was allocated beforehad.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97590
Approved by: https://github.com/eellison
2023-03-28 16:31:10 +00:00
loganthomas
c848a777e8 DOC: Various typo fixes (#97095)
Various typos found while browsing documentation/source code.

Thank you for a wonderful deep-learning library!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97095
Approved by: https://github.com/mikaylagawarecki, https://github.com/kit1980
2023-03-20 20:46:04 +00:00
Zachary DeVito
e74f70d212 Revert "Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541)"" (#96878)
This reverts commit e1ea584b1c.
Adds __has_include check to fix fbcode build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96878
Approved by: https://github.com/ezyang
2023-03-16 04:12:54 +00:00
PyTorch MergeBot
e1ea584b1c Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541)"
This reverts commit 4e1060c609.

Reverted https://github.com/pytorch/pytorch/pull/95541 on behalf of https://github.com/DanilBaibak due to breaking internal builds
2023-03-15 13:28:41 +00:00
Zachary DeVito
4e1060c609 [memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541)
This refactors the stack trace facility specific to memory profiling
    in python+cuda to make a generic facility to generate combined stack
    traces.

    The generic facility (combined_traceback.h) does not require
    python to be around to work, but will return python stacks if it is
    present.

    This facility is then used to add support for stack trace gathering in memory profiling that
    happens directly from C++.

    It is also used to expose a python API for gathering and symbolizing
    combineds stacks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95541
Approved by: https://github.com/ezyang
2023-03-14 18:26:05 +00:00
Zachary DeVito
4b372e3958 [memory profiling] C++ tracing support (#95357)
Adds the ability to quickly generate stack traces for C++,
and combine Python, TorchScript, and C++ frames into a single trace.

This makes it possible for the memory tracer to record allocations inside
C++ code (e.g. convolution temporaries, backward operators).

The unwinder code is ~10x faster than execinfo.h's backward because it
cache fast unwinder routines for instruction pointers that have already been seen.
It is also only 1.2--2x slower than copying the entire stack (the approach perf takes),
while using 2 orders of magnitude less space per stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95357
Approved by: https://github.com/bertmaher
2023-03-12 07:24:14 +00:00
Zachary DeVito
d6d8d3484e _memory_viz.py: Visualize how blocks fit into segments. (#91336)
Add a segment_plot command that visualizes how blocks are allocated into segments.
This is similar to the 'stats' command but produces an interactive html viewer rather
than text dump, allowing exploration of stack traces.

It also adds the ability to see the layout at any point in the trace by starting from the
snapshot and then apply the events backwards to reconstruct what memory would have looked like.

Example:
![Screen Shot 2022-12-22 at 3 32 49 PM](https://user-images.githubusercontent.com/370202/209242650-b952372e-37ac-400a-a01c-13be2b5426fa.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91336
Approved by: https://github.com/bhosmer
2023-03-07 21:07:18 +00:00
Zachary DeVito
71f369092d Revert "Revert "memory viz: Add colors for categories and a legend (#90587)"" (#96133)
This reverts commit b38b39c441.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96133
Approved by: https://github.com/bhosmer
2023-03-07 21:07:18 +00:00
Eli Uriegas
b38b39c441 Revert "memory viz: Add colors for categories and a legend (#90587)"
This reverts commit ee43842505.
2023-03-06 11:38:58 -08:00
Zachary DeVito
ee43842505 memory viz: Add colors for categories and a legend (#90587)
Adds a category legend to memory trace plots that colors allocations by their role (activation, parameter, gradient, etc.) as captured by kineto.

Differential Revision: [D43757381](https://our.internmc.facebook.com/intern/diff/D43757381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90587
Approved by: https://github.com/aaronenyeshi
2023-03-03 20:42:22 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Zachary DeVito
bf2668a899 Add support for kineto in memory viz (#90567)
This is just rudimentary initial support that does the same stuff as the trace profile. Follow will add category encodings to the tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90567
Approved by: https://github.com/robieta
2022-12-13 21:31:16 +00:00
Zachary DeVito
3b3ed25109 Add a way to visualize memory snapshot traces (#90348)
This adds a d3-based interactive visualization for exploring the memory
allocation traces that the caching allocator can capture. This visualization
code can also be attached to kineto trace information in the future to also
provide visualization for the memory events captured there, which come with
addition information about the graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90348
Approved by: https://github.com/robieta
2022-12-10 02:45:11 +00:00
Zachary DeVito
91b1bae1df Caching allocator tracing (#86241)
We currently can take snapshots of the state of the allocated cuda memory, but we do not have a way to correlate these snapshots with the actions the allocator that were taken between snapshots. This PR adds a simple fixed-sized buffer that records the major actions that the allocator takes (ALLOC, FREE, SEGMENT_ALLOC, SEGMENT_FREE, OOM, SNAPSHOT) and includes these with the snapshot information. Capturing period snapshots with a big enough trace buffer makes it possible to see how the allocator state changes over time.

We plan to use this functionality to guide how settings in the allocator can be adjusted and eventually have a more robust overall algorithm.

As a component of this functionality, we also add the ability to get a callback when the allocator will throw an OOM, primarily so that snapshots can be taken immediately to see why the program ran out of memory (most programs have some C++ state that would free tensors before the OutOfMemory exception can be caught).

This PR also updates the _memory_viz.py script to pretty-print the trace information and provide a better textual summary of snapshots distinguishing between internal and external fragmentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86241
Approved by: https://github.com/ngimel
2022-10-07 23:19:54 +00:00
Zachary DeVito
726d040692 annotated allocator snapshots (#82146)
Record stack trace information for each allocated segment in the allocator.
It takes around 1.5us to record 50 stack frames of context.
Since invoking a Pytorch operator is around 8us, this adds minimal overhead but we still leave it disabled by default so that we can test it more on real workloads first.

Stack information is kept both for allocated blocks and the last allocation used inactive blocks. We could potential keep around the _first_ allocation that caused the block to get allocated from cuda as well.

Potential Followups:
* stack frame entries are small (16 bytes), but the list of Frames is not compressed eventhough most frames will share some entries. So far this doesn't produce huge dumps (7MB for one real workload that uses all memory on the GPU), but it can be much smaller through compression.
* Code to format the information is slow (a few seconds) because it uses python and FlameGraph.pl
* Things allocated during the backward pass have no stack frames because they are run on another C++ thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82146
Approved by: https://github.com/albanD
2022-08-09 17:21:35 +00:00