Commit Graph

30 Commits

Author SHA1 Message Date
Anupam Bhatnagar
3336aa191c Adding allocated and reserved memory values to memory timline view. (#107056)
Summary: This diff adds the max allocated and max reserved memory values to the memory timeline plot.

Test Plan:
Executed

`buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test -- --enable_profiling --profile_memory --trace_handler=auto_trace --with_stack --record_shapes` on my devgpu.

The generated output is at
https://www.internalfb.com/manifold/explorer/ai_efficiency/tree/traces/dynocli/devgpu020.odn1.facebook.com/rank-0/rank-0.Aug_10_16_50_50.236946.pt.memorytl.html

 {F1067885545}
Screenshot of the html above
 {F1067886350}

Reviewed By: aaronenyeshi

Differential Revision: D48251791

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107056
Approved by: https://github.com/aaronenyeshi, https://github.com/davidberard98
2023-08-21 17:20:13 +00:00
Aaron Gokaslan
b1e8e01e50 [BE]: Apply PYI autofixes to various types (#107521)
Applies some autofixes from the ruff PYI rules to improve the typing of PyTorch. I haven't enabled most of these ruff rules yet as they do not have autofixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107521
Approved by: https://github.com/ezyang
2023-08-20 02:42:21 +00:00
Edward Z. Yang
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Howard Cheng
3dacc8e847 [PyTorch] [Memory profiler] Early return if qualified name is invalid (#105495)
Summary: Return early if we can easily determine the operator qualified name is invalid before attempting to retrieve the schema. In particular "::" should always be present. Quick estimate shows that this is >50x faster (100 us -> 2 us).

Test Plan: CI

Differential Revision: D47562587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105495
Approved by: https://github.com/aaronenyeshi
2023-07-20 00:58:32 +00:00
Justin Chu
3721fa5612 [BE] Enable ruff's UP rules and autoformat optim/ (#105426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426
Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99
2023-07-18 21:07:43 +00:00
Aaron Enye Shi
e0d2ad1a21 [Profiler][Memory] Export raw timestamped events in export_memory_timeline_raw (#105094)
Summary:
Rather than processing the events into a time and sizes plot, dump the actual events as (timestamp, action, num of bytes, category) when output file ends in `raw.json.gz`.

This can allow downstream analysis tools to process these events. It also avoids having to control the granularity of the previous json.gz in memory profiler.

Test Plan: CI Tests

Differential Revision: D47416544

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105094
Approved by: https://github.com/davidberard98
2023-07-17 17:39:37 +00:00
Aaron Enye Shi
2a4fa25109 [Profiler] Include more uncategorized events in memory profile (#101200)
Summary: This PR adds handling for allocations / frees which we cannot prove are for Tensors. (And thus aren't assigned an ID.) These events are still important for judging overall utilization.

Test Plan: CI and Unit tests.

Differential Revision: D45458885

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101200
Approved by: https://github.com/anupambhatnagar, https://github.com/davidberard98
2023-06-08 16:22:49 +00:00
Aaron Enye Shi
e35323d6a7 [Profiler] Fix HTML plot output for profiler export_memory_timeline (#101316)
Summary: Wrap the PNG image of the memory plot inside of an HTML body, so that the file can be easily opened or embedding in other frontends.

Test Plan:
CI Tests

# Ran locally on Resnet50:
{F988498243}
{F988498789}
https://www.internalfb.com/manifold/explorer/trace_stats/tree/749163530321413/tmpj3ifzs7r.pt.memorytl.html

Differential Revision: D45827509

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101316
Approved by: https://github.com/xuzhao9
2023-05-15 16:31:06 +00:00
Aaron Enye Shi
87b71e570e [Profiler] Support HTML plot output for profiler export_memory_timeline API (#99751)
Summary:
Support the file extension .html, which will include a PNG image of the plot embedded into an HTML file.

This allows users to avoid processing the timeline manually in their own frontend UI.

Test Plan:
CI Tests

Ran on resnet50 model and generated this html file w/ plot:
See attached html file: {F954232276}
Screenshot: {F954232469}

Differential Revision: D45152735

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99751
Approved by: https://github.com/davidberard98
2023-04-22 04:21:58 +00:00
Edward Z. Yang
9a8f71f23e Convert logging f-strings to use % format (#98697)
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Sergii Dymchenko
477f3f555f Simplify by using yield from (#97831)
The issues were found by SIM104 flake8-simplify in a local run.

I'll take a look on adding the check to the CI separately.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97831
Approved by: https://github.com/Skylion007
2023-03-29 19:15:24 +00:00
Aaron Gokaslan
5471621497 [BE] Remove unnecessary dict comprehensions (#97116)
Removes unnecessary dict comprehensions that optimize creation of dicts from iterables

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116
Approved by: https://github.com/kit1980
2023-03-20 00:56:57 +00:00
Aaron Enye Shi
1e6961586b [Profiler] Memory timeline to show actual timestamps (#96535)
Summary: Rather than starting the timeline at t=0, keep the actual timestamps of the memory events.

Test Plan: CI Tests

Reviewed By: leitian, chaekit

Differential Revision: D43807624

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96535
Approved by: https://github.com/davidberard98
2023-03-11 00:25:30 +00:00
Aaron Enye Shi
e948ba07d4 [Profiler] Add export_memory_timeline to save memory timeline plot to file (#96137)
Summary: Added the functionality to export the memory timeline plot as a list of times and sizes, which the post processing visualization can parse and plot.

Test Plan: CI Tests

Reviewed By: leitian, fengxizhou

Differential Revision: D43680760

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96137
Approved by: https://github.com/chaekit
2023-03-10 18:20:25 +00:00
Aaron Gokaslan
0444a6c90a [BE] Remove deprecated logging warn method (#94708)
Swaps all logging.warn calls to logging.warning since the former is deprecated and even raises a deprecation warning now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94708
Approved by: https://github.com/ezyang
2023-02-13 18:24:52 +00:00
Edward Z. Yang
eef019c14a Lint rule to forbid direct use of logging.info/etc APIs (#90907)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90907
Approved by: https://github.com/jansel
2022-12-16 05:13:51 +00:00
Taylor Robie
63e57280fc [Profiler] Memory profiler part 13: Add sizes to timeline. (#89356)
If we see an allocation the size is unambiguous. Otherwise we have to use sizes and strides to bound the underlying storage.

Differential Revision: [D40868660](https://our.internmc.facebook.com/intern/diff/D40868660/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89356
Approved by: https://github.com/chaekit
2022-12-02 03:55:22 +00:00
Taylor Robie
6727e537a7 [Profiler] Memory profiler part 12: Emit timeline of memory events. (#89355)
Add a simple interface to get a flat representation of the memory profile.

Differential Revision: [D40868663](https://our.internmc.facebook.com/intern/diff/D40868663/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89355
Approved by: https://github.com/chaekit
2022-12-02 03:55:22 +00:00
Taylor Robie
b709078dc6 [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926)
There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.)

Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926
Approved by: https://github.com/chaekit
2022-11-27 12:20:30 +00:00
Taylor Robie
143d2881a8 [Profiler] Memory profiler part 10: Mark optimizer state (#88925)
This is also a fairly simple pass, since we're simply collecting values from the python tracer.

Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925
Approved by: https://github.com/chaekit
2022-11-27 12:20:30 +00:00
Taylor Robie
ae725d501e [Profiler] Memory profiler part 9: Mark activations (#88924)
This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass.

Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924
Approved by: https://github.com/chaekit
2022-11-27 12:20:28 +00:00
Taylor Robie
0435894bb3 [Profiler] Memory profiler part 8: Mark parameters. (#87568)
Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates.

Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.)

Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568
Approved by: https://github.com/chaekit
2022-11-27 02:10:29 +00:00
Taylor Robie
17fa6bf1f5 [Profiler] Memory profiler part 7: Mark inputs (#87567)
It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference.

Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements.

Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567
Approved by: https://github.com/chaekit
2022-11-27 02:10:27 +00:00
Taylor Robie
64c5c77cd4 [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566)
Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories.

We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage.

Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node.

Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.)

Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566
Approved by: https://github.com/chaekit
2022-11-27 02:10:26 +00:00
Taylor Robie
5f09a6d573 [Profiler] Memory profiler part 5: Data flow graph (#87006)
The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.

It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.

Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
Approved by: https://github.com/chaekit
2022-11-27 00:28:57 +00:00
Taylor Robie
c3116dd78b [Profiler] Memory profiler part 4: Select top level torch ops (#86880)
In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is.

Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880
Approved by: https://github.com/chaekit
2022-11-27 00:28:57 +00:00
Taylor Robie
8023c9dc64 [Profiler] Memory profiler part 3: Schema parsing and mutable arguments (#86854)
The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc.

We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision.

Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86854
Approved by: https://github.com/chaekit
2022-11-15 19:17:57 +00:00
Taylor Robie
2439bc1e9b [Profiler] Memory profiler part 2: Config validation (#86853)
Memory profiling requires `record_shapes`, `profile_memory`, and `with_stack`. This PR just adds a skeleton endpoint with a good error message if certain flags are missing.

Differential Revision: [D39920801](https://our.internmc.facebook.com/intern/diff/D39920801/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86853
Approved by: https://github.com/chaekit
2022-11-15 19:17:57 +00:00
Taylor Robie
cef13ebea0 [Profiler] Memory profiler part 1: Gradient identification (#86802)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86802
Approved by: https://github.com/chaekit
2022-11-08 23:53:13 +00:00