@bypass-github-export-checks
This change ensures that vulkan event start/end times are correctly synced with their parent CPU times.
This sometimes requires increasing CPU event durations (to fully contain their child events) and delaying CPU event start times (to prevent overlaps), so this should not be used unless Vulkan events are being profiled and it is ok to use this modified timestamp/duration information instead of the the original information.
Differential Revision: [D39893109](https://our.internmc.facebook.com/intern/diff/D39893109/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39893109/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90672
Approved by: https://github.com/kimishpatel
Summary:
Adding new API call
```
mobile::getCurrentEdgeProfiler()->recordBackendMemoryEvent(
ptr, alloc_size, total_allocated, total_reserved, device);
```
As well as another macro to use for recording backend memory events. These memory events will be captured in the trace file when we create the profiler:
```
{
KinetoEdgeCPUProfiler profiler(
module,
trace_file_name,
false, // record input_shapes
true, // profile memory
true, // record callstack
false, // record flops
true); // record module hierarchy
module.forward(inputs);
}
```
Test Plan: Testing to do in the next diff
Differential Revision: D37116111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80350
Approved by: https://github.com/kimishpatel
One source of complexity in profiler_kineto is that we do most things twice: once to set a field in `kineto_events_.back()`, and once for the metadata json. These have historically been chained, with the KinetoEvent used to populate the metadata fields. However this is hard to read and error prone, as we have one giant block of assignments followed by another giant block. It also means that logic about whether a field is present or not is duplicated.
This PR replaces this logic with a visitor that writes both together. E.g.
```
auto& dtypes = result_.get().inputs_.dtypes_;
if (!dtypes.empty()) {
kineto_event_.get().dtypes(dtypes);
out.emplace_back("Input type", dtypesToStr(dtypes));
}
```
Differential Revision: [D36070202](https://our.internmc.facebook.com/intern/diff/D36070202/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77691
Approved by: https://github.com/aaronenyeshi
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421
I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.
Test Plan: Unit tests and CI.
Reviewed By: aaronenyeshi, albanD
Differential Revision: D32865907
fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421
Original commit changeset: ab6bb8fe4e83
Plus this incldes BUILD.bazel changes, the reason for the revert.
Test Plan: See original diff
Reviewed By: gdankel
Differential Revision: D31542513
fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.
Test Plan:
test_lite_interpreter
Imported from OSS
Reviewed By: raziel
Differential Revision: D30710710
fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307
Original commit changeset: 0b2aa7c57d08
Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30680354
fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985
Reviewed By: raziel
Differential Revision: D30327514
fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419
This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.
Test Plan:
MobileProfiler.ModuleHierarchy
Imported from OSS
Reviewed By: raziel
Differential Revision: D29993660
fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299