pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Anupam Bhatnagar	3336aa191c	Adding allocated and reserved memory values to memory timline view. (#107056 ) Summary: This diff adds the max allocated and max reserved memory values to the memory timeline plot. Test Plan: Executed `buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test -- --enable_profiling --profile_memory --trace_handler=auto_trace --with_stack --record_shapes` on my devgpu. The generated output is at https://www.internalfb.com/manifold/explorer/ai_efficiency/tree/traces/dynocli/devgpu020.odn1.facebook.com/rank-0/rank-0.Aug_10_16_50_50.236946.pt.memorytl.html {F1067885545} Screenshot of the html above {F1067886350} Reviewed By: aaronenyeshi Differential Revision: D48251791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107056 Approved by: https://github.com/aaronenyeshi, https://github.com/davidberard98	2023-08-21 17:20:13 +00:00
Aaron Gokaslan	b1e8e01e50	[BE]: Apply PYI autofixes to various types (#107521 ) Applies some autofixes from the ruff PYI rules to improve the typing of PyTorch. I haven't enabled most of these ruff rules yet as they do not have autofixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107521 Approved by: https://github.com/ezyang	2023-08-20 02:42:21 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Howard Cheng	3dacc8e847	[PyTorch] [Memory profiler] Early return if qualified name is invalid (#105495 ) Summary: Return early if we can easily determine the operator qualified name is invalid before attempting to retrieve the schema. In particular "::" should always be present. Quick estimate shows that this is >50x faster (100 us -> 2 us). Test Plan: CI Differential Revision: D47562587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105495 Approved by: https://github.com/aaronenyeshi	2023-07-20 00:58:32 +00:00
Justin Chu	3721fa5612	[BE] Enable ruff's UP rules and autoformat optim/ (#105426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426 Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99	2023-07-18 21:07:43 +00:00
Aaron Enye Shi	e0d2ad1a21	[Profiler][Memory] Export raw timestamped events in export_memory_timeline_raw (#105094 ) Summary: Rather than processing the events into a time and sizes plot, dump the actual events as (timestamp, action, num of bytes, category) when output file ends in `raw.json.gz`. This can allow downstream analysis tools to process these events. It also avoids having to control the granularity of the previous json.gz in memory profiler. Test Plan: CI Tests Differential Revision: D47416544 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/105094 Approved by: https://github.com/davidberard98	2023-07-17 17:39:37 +00:00
Aaron Enye Shi	2a4fa25109	[Profiler] Include more uncategorized events in memory profile (#101200 ) Summary: This PR adds handling for allocations / frees which we cannot prove are for Tensors. (And thus aren't assigned an ID.) These events are still important for judging overall utilization. Test Plan: CI and Unit tests. Differential Revision: D45458885 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/101200 Approved by: https://github.com/anupambhatnagar, https://github.com/davidberard98	2023-06-08 16:22:49 +00:00
Aaron Enye Shi	e35323d6a7	[Profiler] Fix HTML plot output for profiler export_memory_timeline (#101316 ) Summary: Wrap the PNG image of the memory plot inside of an HTML body, so that the file can be easily opened or embedding in other frontends. Test Plan: CI Tests # Ran locally on Resnet50: {F988498243} {F988498789} https://www.internalfb.com/manifold/explorer/trace_stats/tree/749163530321413/tmpj3ifzs7r.pt.memorytl.html Differential Revision: D45827509 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/101316 Approved by: https://github.com/xuzhao9	2023-05-15 16:31:06 +00:00
Aaron Enye Shi	87b71e570e	[Profiler] Support HTML plot output for profiler export_memory_timeline API (#99751 ) Summary: Support the file extension .html, which will include a PNG image of the plot embedded into an HTML file. This allows users to avoid processing the timeline manually in their own frontend UI. Test Plan: CI Tests Ran on resnet50 model and generated this html file w/ plot: See attached html file: {F954232276} Screenshot: {F954232469} Differential Revision: D45152735 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/99751 Approved by: https://github.com/davidberard98	2023-04-22 04:21:58 +00:00
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Sergii Dymchenko	477f3f555f	Simplify by using yield from (#97831 ) The issues were found by SIM104 flake8-simplify in a local run. I'll take a look on adding the check to the CI separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97831 Approved by: https://github.com/Skylion007	2023-03-29 19:15:24 +00:00
Aaron Gokaslan	5471621497	[BE] Remove unnecessary dict comprehensions (#97116 ) Removes unnecessary dict comprehensions that optimize creation of dicts from iterables Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116 Approved by: https://github.com/kit1980	2023-03-20 00:56:57 +00:00
Aaron Enye Shi	1e6961586b	[Profiler] Memory timeline to show actual timestamps (#96535 ) Summary: Rather than starting the timeline at t=0, keep the actual timestamps of the memory events. Test Plan: CI Tests Reviewed By: leitian, chaekit Differential Revision: D43807624 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/96535 Approved by: https://github.com/davidberard98	2023-03-11 00:25:30 +00:00
Aaron Enye Shi	e948ba07d4	[Profiler] Add export_memory_timeline to save memory timeline plot to file (#96137 ) Summary: Added the functionality to export the memory timeline plot as a list of times and sizes, which the post processing visualization can parse and plot. Test Plan: CI Tests Reviewed By: leitian, fengxizhou Differential Revision: D43680760 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/96137 Approved by: https://github.com/chaekit	2023-03-10 18:20:25 +00:00
Aaron Gokaslan	0444a6c90a	[BE] Remove deprecated logging warn method (#94708 ) Swaps all logging.warn calls to logging.warning since the former is deprecated and even raises a deprecation warning now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94708 Approved by: https://github.com/ezyang	2023-02-13 18:24:52 +00:00
Edward Z. Yang	eef019c14a	Lint rule to forbid direct use of logging.info/etc APIs (#90907 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90907 Approved by: https://github.com/jansel	2022-12-16 05:13:51 +00:00
Taylor Robie	63e57280fc	[Profiler] Memory profiler part 13: Add sizes to timeline. (#89356 ) If we see an allocation the size is unambiguous. Otherwise we have to use sizes and strides to bound the underlying storage. Differential Revision: [D40868660](https://our.internmc.facebook.com/intern/diff/D40868660/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89356 Approved by: https://github.com/chaekit	2022-12-02 03:55:22 +00:00
Taylor Robie	6727e537a7	[Profiler] Memory profiler part 12: Emit timeline of memory events. (#89355 ) Add a simple interface to get a flat representation of the memory profile. Differential Revision: [D40868663](https://our.internmc.facebook.com/intern/diff/D40868663/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89355 Approved by: https://github.com/chaekit	2022-12-02 03:55:22 +00:00
Taylor Robie	b709078dc6	[Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926 ) There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.) Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926 Approved by: https://github.com/chaekit	2022-11-27 12:20:30 +00:00
Taylor Robie	143d2881a8	[Profiler] Memory profiler part 10: Mark optimizer state (#88925 ) This is also a fairly simple pass, since we're simply collecting values from the python tracer. Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925 Approved by: https://github.com/chaekit	2022-11-27 12:20:30 +00:00
Taylor Robie	ae725d501e	[Profiler] Memory profiler part 9: Mark activations (#88924 ) This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass. Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924 Approved by: https://github.com/chaekit	2022-11-27 12:20:28 +00:00
Taylor Robie	0435894bb3	[Profiler] Memory profiler part 8: Mark parameters. (#87568 ) Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates. Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.) Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568 Approved by: https://github.com/chaekit	2022-11-27 02:10:29 +00:00
Taylor Robie	17fa6bf1f5	[Profiler] Memory profiler part 7: Mark inputs (#87567 ) It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference. Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements. Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567 Approved by: https://github.com/chaekit	2022-11-27 02:10:27 +00:00
Taylor Robie	64c5c77cd4	[Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566 ) Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories. We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage. Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node. Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.) Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566 Approved by: https://github.com/chaekit	2022-11-27 02:10:26 +00:00
Taylor Robie	5f09a6d573	[Profiler] Memory profiler part 5: Data flow graph (#87006 ) The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs. It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node. Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006 Approved by: https://github.com/chaekit	2022-11-27 00:28:57 +00:00
Taylor Robie	c3116dd78b	[Profiler] Memory profiler part 4: Select top level torch ops (#86880 ) In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is. Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880 Approved by: https://github.com/chaekit	2022-11-27 00:28:57 +00:00
Taylor Robie	8023c9dc64	[Profiler] Memory profiler part 3: Schema parsing and mutable arguments (#86854 ) The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86854 Approved by: https://github.com/chaekit	2022-11-15 19:17:57 +00:00
Taylor Robie	2439bc1e9b	[Profiler] Memory profiler part 2: Config validation (#86853 ) Memory profiling requires `record_shapes`, `profile_memory`, and `with_stack`. This PR just adds a skeleton endpoint with a good error message if certain flags are missing. Differential Revision: [D39920801](https://our.internmc.facebook.com/intern/diff/D39920801/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86853 Approved by: https://github.com/chaekit	2022-11-15 19:17:57 +00:00
Taylor Robie	cef13ebea0	[Profiler] Memory profiler part 1: Gradient identification (#86802 ) There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86802 Approved by: https://github.com/chaekit	2022-11-08 23:53:13 +00:00

30 Commits