mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

cyy 425c6d8eba Replace c10::is_pod with std::is_trivial (#149286 ) These remaining c10::is_pod calls can be replaced without compromising the semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149286 Approved by: https://github.com/zou3519		2025-03-18 01:33:01 +00:00
..
orchestration	[Profiler] Add profiler activity for HPU devices (#148182 )	2025-03-05 01:37:48 +00:00
python	Enable misc-use-internal-linkage check and apply fixes (#148948 )	2025-03-12 14:22:56 +00:00
standalone	Clean up grid in execution trace (#149159 )	2025-03-14 07:12:16 +00:00
stubs	[19/N] Fix extra warnings brought by clang-tidy-17 (#144448 )	2025-01-09 15:58:05 +00:00
unwind	[19/N] Fix extra warnings brought by clang-tidy-17 (#144448 )	2025-01-09 15:58:05 +00:00
api.h	Concat namespaces in torch/csrc/profiler and other fixes (#127266 )	2024-05-28 15:21:32 +00:00
collection.cpp	Add overload names to profiler trace (#143114 )	2025-03-05 01:00:29 +00:00
collection.h	Replace c10::is_pod with std::is_trivial (#149286 )	2025-03-18 01:33:01 +00:00
combined_traceback.cpp	Remove NOLINTNEXTLINE (#146238 )	2025-02-04 02:45:32 +00:00
combined_traceback.h	[CCA][Memory Snapshot] Move user_defined annotations to Native Caching Allocator (#130964 )	2024-07-25 14:06:52 +00:00
containers.h	[2/N] Enable cppcoreguidelines-special-member-functions (#138670 )	2024-10-24 04:35:18 +00:00
data_flow.cpp	[3/N] Apply bugprone-unchecked-optional-access (#142442 )	2024-12-11 01:39:10 +00:00
data_flow.h	Concat namespaces in torch/csrc/profiler and other fixes (#127266 )	2024-05-28 15:21:32 +00:00
events.h	Concat namespaces in torch/csrc/profiler and other fixes (#127266 )	2024-05-28 15:21:32 +00:00
kineto_client_interface.cpp	init kineto after torch module initialized (#131448 )	2024-10-31 13:24:24 +00:00
kineto_client_interface.h	[11/N] Fix extra warnings brought by clang-tidy-17 (#139599 )	2024-11-04 23:57:41 +00:00
kineto_shim.cpp	Replace c10::is_pod with std::is_trivial (#149286 )	2025-03-18 01:33:01 +00:00
kineto_shim.h	[Profiler] Create Auto-Trace Frontend for Trace ID (#139310 )	2024-10-31 19:02:57 +00:00
perf-inl.h	Switch c10::string_view to std::string_view (#139635 )	2024-11-27 01:41:18 +00:00
perf.cpp	Enable misc-use-internal-linkage check and apply fixes (#148948 )	2025-03-12 14:22:56 +00:00
perf.h	[3/N] Fix cppcoreguidelines-special-member-functions warnings (#138796 )	2024-10-28 10:53:11 +00:00
README.md
util.cpp	[Easy/Profiler] Add last entry to truncated values (#148576 )	2025-03-06 01:14:15 +00:00
util.h	[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#140637 )	2024-11-19 22:22:16 +00:00

README.md

Profiler Overview

This README describes the details of how the profiler is implemented.

The profiler instruments PyTorch to collect information about the model's execution. Its main features are:

Instrumenting op calls on the CPU side
Interfacing with Kineto to collect information from the GPU (or other accelerators)
Collecting python stack traces
Exporting this information, e.g. in a chrome trace, or to be processed by downstream tools like HTA

Codebase Structure
RecordFunction
Autograd Integration
Collection and Post-Processing
Kineto Integration
Python Tracing

Codebase Structure

TODO

`RecordFunction`

/aten/src/ATen/record_function.h

RecordFunction is used by the profiler to instrument CPU-side events.

RecordFunction is a general method of instrumenting function calls in PyTorch. It can be used for other general applications, e.g. see Features for Large-Scale Deployments. In PyTorch, it is already included at some important locations; notably, in the dispatcher, surrounding every op.

Users (or PyTorch itself) can register callbacks that will be executed whenever a RecordFunction guard is encountered. The profiler uses this mechanism to record the start and end times for each op call, as well as user-provided RecordFunction annotations. The RecordFunction machinery is designed to have relatively low overhead, especially when there are no callbacks registered. Nevertheless, there can still be some overhead.

There is also a python binding for RecordFunction in python (with torch.profiler.record_function); this is often used by users to annotate events corresponding to module-level events.

Autograd Integration

The autograd engine is responsible for automatically computing gradients.

The profiler records two pieces of information from the autograd engine:

Sequence number: this is a unique-per-thread index assigned to each op call(*) in the forward pass. When a backward op is triggered, it is also assigned a sequence number matching the sequence number of the forward op that caused that backward op to be executed. Using this information, the profiler is able to match forward and backward ops; in chrome traces, this feature can be enabled with the "fwd_bwd" flow events
Forward thread id: Autograd can be used in multi-threaded environments. The forward thread ID indicates the ID of the thread on which the forward op was executed on. This information is needed because the sequence number, mentioned above, is only unique within a thread; the forward thread ID is used for differentiating different ops with the same sequence number.

(*) Note that only op invocations whose inputs require gradients are assigned a sequence number

Collection and Post-Processing

TODO

Kineto Integration

TODO

Python Tracing

TODO

README.md

Profiler Overview

Table of Contents

Codebase Structure

RecordFunction

Autograd Integration

Collection and Post-Processing

Kineto Integration

Python Tracing

`RecordFunction`