Commit Graph

45 Commits

Author SHA1 Message Date
Aaron Enye Shi
63c089b09d [c10] Move profiler clock to libc10 for timestamps (#111972)
Summary:
Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time.

The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff.

Test Plan: CI

Differential Revision: D50601935

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972
Approved by: https://github.com/davidberard98
2023-10-27 16:18:40 +00:00
cyy
d58a91b2a6 [4/N] Move remaining c10::variant calls to std::variant (#110382)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382
Approved by: https://github.com/Skylion007
2023-10-02 23:52:04 +00:00
fwenguang
c4f2b6dbd2 [profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002)
At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type.
PyCMethod_Type is subtype of PyCFunction_Type, ref to
f2eaa92b0c/Objects/methodobject.c (L372).
So there should use PyCFunction_Check to check arg->ob_type.

Fixes #109877

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002
Approved by: https://github.com/ezyang
2023-09-25 20:17:25 +00:00
cyy
75b954b715 [4/N] Enable clang-tidy in torch/csrc/autograd (#109455)
The PR enables clang-tidy checks in torch/csrc/autograd.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109455
Approved by: https://github.com/Skylion007
2023-09-17 17:11:50 +00:00
cyy
a14d30d8d1 [1/N] apply clang-tidy in torch/csrc/autograd (#109032)
This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-09-15 23:28:43 +00:00
cyy
36b8ca4e48 [2/N] apply clang-tidy in torch/csrc/autograd (#109277)
This PR follows the work of PR #109032.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109277
Approved by: https://github.com/albanD
2023-09-15 00:39:12 +00:00
cyy
e4f3e5434f [Reland] Elimates c10::guts::to_string (#108748)
Reland of PR #108480, after relanding another blocking PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748
Approved by: https://github.com/huydhn
2023-09-07 13:35:17 +00:00
PyTorch MergeBot
8da04e023e Revert "Eliminate c10::guts::to_string (#108480)"
This reverts commit 4146be192e.

Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted.  Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))
2023-09-05 18:04:53 +00:00
cyy
4146be192e Eliminate c10::guts::to_string (#108480)
This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string.
Some other guts functions in the affected source files are also replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480
Approved by: https://github.com/Skylion007
2023-09-04 08:12:53 +00:00
Scott Wolchok
99f68d56ee [PyTorch] Delete c10::guts::if_constexpr (#101991)
Now that we have C++17, we should not need this any more.

Differential Revision: [D46078335](https://our.internmc.facebook.com/intern/diff/D46078335/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101991
Approved by: https://github.com/r-barnes, https://github.com/Skylion007
2023-05-23 23:19:35 +00:00
Taylor Robie
d09cd15216 [Profiler] Defer recording startup python events (take 2) (#91684)
This is my commandeer of https://github.com/pytorch/pytorch/pull/82154 with a couple extra fixes.

The high level idea is that when we start profiling we see python frames which are currently executing, but we don't know what system TID created them. So instead we defer the TID assignment, and then during post processing we peer into the future and use the system TID *of the next* call on that Python TID.

As an aside, it turns out that CPython does some bookkeeping (ee821dcd39/Include/cpython/pystate.h (L159-L165), thanks @dzhulgakov for the pointer), but you'd have to do some extra work at runtime to know how to map their TID to ours so for now I'm going to stick to what I can glean from post processing alone.

As we start observing more threads it becomes more important to be principled about how we start up and shut down. (Since threads may die while the profiler is running.) #82154 had various troubles with segfaults that wound up being related to accessing Python thread pointers which were no longer alive. I've tweaked the startup and shutdown interaction with the CPython interpreter and it should be safer now.

Differential Revision: [D42336292](https://our.internmc.facebook.com/intern/diff/D42336292/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91684
Approved by: https://github.com/chaekit
2023-02-11 18:44:00 +00:00
Taylor Robie
4c6a7faec5 [Profiler] Use RAII wrapper to manage refcounts during python tracer startup. (#91646)
Refcounting is hard. (Citation needed.) https://github.com/pytorch/pytorch/pull/81242 introduced a corner case where we would over incref when breaking out due to max (128) depth. https://github.com/pytorch/pytorch/pull/85847 ostensibly fixed a segfault, but in actuality was over incref-ing because PyEval_GetFrame returns a borrowed reference while `PyFrame_GetBack` returns a strong reference.

Instead of squinting really hard at the loops, it's much better to use the RAII wrapper and do the right thing by default.

I noticed the over incref issue because of a memory leak where Tensors captured by the closure of a function would be kept alive by zombie frames.

Differential Revision: [D42184394](https://our.internmc.facebook.com/intern/diff/D42184394/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91646
Approved by: https://github.com/albanD
2023-02-10 00:28:18 +00:00
Aaron Gokaslan
8c8cd9539d Add missing moves to torch autograd (#92772)
Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772
Approved by: https://github.com/ezyang
2023-01-24 02:01:52 +00:00
Aaron Gokaslan
77c2a8a11f Clang-Tidy: Improve ctors by removing unnecessary copies and initializations (#91538)
Apply clang-tidy fixups to prefer member initializer and modernize-pass-by-value. This is a mostly a noop, but it should make a few ctors slighlty more readable and more efficient. Also drops in some missing moves that prevents a lot of unnecessary copying.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91538
Approved by: https://github.com/ezyang
2022-12-31 07:19:30 +00:00
Aaron Gokaslan
553b592824 Clang-Tidy: use modern for each loops and transparent functors (#91449)
This applies some more clang-tidy fixups. Particularly, this applies the modernize loops and modernize-use-transparent-functors checks. Transparent functors are less error prone since you don't have to worry about accidentally specifying the wrong type and are newly available as of C++17.

Modern foreach loops tend be more readable and can be more efficient to iterate over since the loop condition is removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91449
Approved by: https://github.com/ezyang
2022-12-29 23:37:51 +00:00
Aaron Gokaslan
3916d7a575 Apply modernize-use-emplace to aten, c10, torch (#91077)
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
2022-12-19 07:49:56 +00:00
Taylor Robie
6e6f929b2c [Profiler] Restructure inputs and capture TensorLists. (#87825)
This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata.

I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through.

Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87825
Approved by: https://github.com/slgong-fb, https://github.com/chaekit
2022-11-08 21:48:43 +00:00
Taylor Robie
b16b5fb802 [Profiler] Hold weak reference to prevent TensorImpl address reuse during profiling. (#87244)
A recurring problem with assigning Tensor IDs is that we want to preserve identity when storage changes but we don't observe TensorImpl destruction so identity assignment is not robust to the ABA problem with respect to TensorImpl*. ~TensorImpl is far too hot to instrument; even adding a call to a no-op function in a different compilation unit increases overhead by tens of percent. (OSS builds do not have any sort of LTO.)

Fortunately there is a solution. A PyTorch Tensor is a `c10::intrusive_ptr<c10::TensorImpl>`, which in turn holds a storage. (Which is a `c10::intrusive_ptr<c10::StorageImpl>`) `c10::intrusive_ptr` has a `c10::weak_intrusive_ptr` class for taking non-owning references to the underlying object. The implementation involves both a strong refcount and weak refcount in `c10::intrusive_ptr`. If the strong refcount of an intrusive_ptr goes to zero and there are no weak references then everything is deleted. However if there is a weak reference then the intrusive_ptr calls `release_resources()` but not delete.

This has the effect of freeing the underlying resources (ensuring that program semantics are unchanged) but leaves behind an empty shell of an `intrusive_ptr` that the `weak_intrusive_ptr`s use to check status. And herein lies the solution: as long as we hold a weak reference to a TensorImpl we will block deletion and prevent the `TensorImpl*` from being reused.

This PR uses a `c10::weak_intrusive_ptr<c10::TensorImpl>` to store the address of profiled TensorImpls and then converts it to a raw pointer (or rather, a `TensorImplAddress`) during post processing when we no longer care about blocking address reuse.

Differential Revision: [D40492848](https://our.internmc.facebook.com/intern/diff/D40492848/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87244
Approved by: https://github.com/slgong-fb, https://github.com/albanD
2022-10-27 06:38:11 +00:00
Taylor Robie
b0e10292fa [Profiler] Tensor IDs for Module and Optimizer variables (#86754)
More sophisticated profiling will increasingly rely on python tracer to contextualize observed results. This PR adds Tensors which are observed by the python tracer to the identity assignment loop.

Differential Revision: [D39852885](https://our.internmc.facebook.com/intern/diff/D39852885/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86754
Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi
2022-10-23 19:23:42 +00:00
Taylor Robie
be2d647ea6 [Profiler] Use parameter as key for optimizer state recording. (#86753)
While optimizer can store state however it likes, in practice most optimizer state corresponds to a particular parameter. (This is the case for all `torch.optim` optimizers.) Thus, it turns out to be ergonomic to collect using that structure. Note that this doesn't lock us into anything; we can always collect state with non Tensor keys if the use case arises.

One simplification that arises is that Module and Optimizer collection has very similar structure. So similar, in fact, that it is possible to use a common template for config. I also found that a lot of the `check_and_store` logic could be simplified and inlined by this joining of collected optimizer state.

Differential Revision: [D40210703](https://our.internmc.facebook.com/intern/diff/D40210703/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86753
Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi
2022-10-23 19:23:39 +00:00
Seonglyong Gong
dbea07b6aa [Profiler] record gradient from nnModule (#86355)
Summary:
- catch .grad tensor info
- update data type and `check_and_store`, etc
- update unit test case

Test Plan: buck run mode/opt //caffe2/test:profiler

Reviewed By: chaekit

Differential Revision: D39711295

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86355
Approved by: https://github.com/chaekit
2022-10-07 09:58:50 +00:00
Seonglyong Gong
a117fde86f [Profiler] Apply TensorMetadata for Optimizer and nnModule (#86047)
Summary: - Use `TensorMetadat` struct in saving tensor info from Optimizer and nnModule.

Test Plan: buck run mode/opt //caffe2/test:profiler

Reviewed By: chaekit

Differential Revision: D39682205

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86047
Approved by: https://github.com/chaekit, https://github.com/robieta
2022-10-06 06:18:56 +00:00
Seonglyong Gong
3cfc61b846 [Profiler][trivial] Optimizer states (part 4 of Record Optimizer) (#85840)
Summary: - add states into OptInfo and update unit testcase

Test Plan: buck run mode/opt //caffe2/test:profiler

Reviewed By: chaekit

Differential Revision: D39406540

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85840
Approved by: https://github.com/robieta
2022-09-29 07:28:33 +00:00
Seonglyong Gong
7628603aee [Profiler] bug fix: python object reference counting (#85847)
Summary:
Wrong reference counting of Python Objects has made intermittent and corner-case-only segfault.
- before : increment once decrement in a loop.
- after: increment and decrement in different but consistent loops.

Test Plan: buck run mode/opt //caffe2/test:profiler

Differential Revision: D39902973

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85847
Approved by: https://github.com/robieta, https://github.com/aaronenyeshi
2022-09-29 03:58:34 +00:00
Seonglyong Gong
d776693701 [Profiler] Optimizer param_groups (part 3 of Record Optimizer) (#85784)
Summary:
- use TensorMetadata struct
- check_and_store util as overloading
- param_groups
- clean up unit test cases

Test Plan: buck run mode/opt //caffe2/test:profiler

Reviewed By: chaekit

Differential Revision: D39406072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85784
Approved by: https://github.com/aaronenyeshi, https://github.com/robieta
2022-09-28 19:18:12 +00:00
Seonglyong Gong
f80ef73d1c [Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920)
Summary:
Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063)
- hooking from optimizer step
- PyOptCall Type
- declare data type for collection
- python binding
- simple unit test case

Test Plan: buck run mode/opt //caffe2/test:profiler

Differential Revision: D39402667

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920
Approved by: https://github.com/robieta
2022-09-28 02:48:07 +00:00
Seonglyong Gong
dc865bff4e [Profiler] set_class util (part 1 of Record Optimizer) (#84779)
Summary:
Part 1 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063)
- nnModule and Optimizer have duplicated parts
- create a util function to avoid duplication

Test Plan: buck run mode/opt //caffe2/test:profiler

Differential Revision: D39397210

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84779
Approved by: https://github.com/robieta
2022-09-13 01:48:41 +00:00
Taylor Robie
daffff9986 [Profiler] Make RecordQueue manage the lifetime of PythonTracer. (#83964)
`PythonTracer` holds a pointer to an owning `RecordQueue`, however that relationship is not enforced and it is possible to dangle that pointer if the ProfilerState owning the `RecordQueue` is destroyed without proper cleanup.

We currently use a singleton to enforce the requirement that only one python tracer is active at a time, however a better formulation is to simply enforce that with an atomic bool and manage object lifetime through composition. In this new architecture, `RecordQueue` explicitly holds a unique_ptr to the python tracer instance. That way if `~RecordQueue` is called it will call `~PythonTracer` which can then clean up any state. Overall it is just a simpler ownership model, and less prone to unexpected failures.

Differential Revision: [D38955616](https://our.internmc.facebook.com/intern/diff/D38955616/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83964
Approved by: https://github.com/slgong-fb
2022-09-09 19:04:08 +00:00
Taylor Robie
328538700a [Profiler][Trivial] Move PythonTracerBase to torch/csrc/profiler/orchestration (#83895)
The ownership model between `RecordQueue` and `PythonTracer` is brittle; if a profiler is popped without proper shutdown it can dangle a reference in `PythonTracer` which will segfault when dereferenced. The next PR will address this; to start we simply move the code into `torch/csrc/profiler/orchestration` to limit the sloc delta when making actual changes.

Differential Revision: [D38933962](https://our.internmc.facebook.com/intern/diff/D38933962/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38933962/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83895
Approved by: https://github.com/slgong-fb
2022-09-09 19:04:08 +00:00
Seonglyong Gong
fa241fd50e [Profiler] record nn.Module's parameters (#83209)
Summary:
Record nn.Module's parameters for detaild memory profiling:
- extend 'module_' in value cache  & NNModuleInfo to save parameters
- python binding and unit test case

Test Plan: buck run mode/opt //caffe2/test:profiler -- -r test_nnmodule

Differential Revision: D38379717

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83209
Approved by: https://github.com/robieta
2022-08-24 08:17:20 +00:00
Taylor Robie
09e837634b [Profiler][Minor] Set end time on python events when profiling stops. (#83621)
We don't have an end event for calls that are ongoing when profiling stops. (e.g. main) This cropped up when I was adding checks for negative durations.

I also refactored `populate` to use a pop method. This not only allows me to implement this fix, but should also provide a convenient entry point for https://github.com/pytorch/pytorch/pull/82154

Differential Revision: [D38426342](https://our.internmc.facebook.com/intern/diff/D38426342/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83621
Approved by: https://github.com/slgong-fb
2022-08-21 00:22:11 +00:00
Taylor Robie
7edd947178 [Profiler][Python tracer] Add ephemeral inputs to the value cache. (#81958)
There are a couple of bugs in the python tracer related to how we cache values. The first is that `ValueCache::store<CallType::PyModuleCall>` wrongly assumes that it will only be called from the profiling callback and calls `PyEval_GetFrame`, effectively violating the encapsulation of the cache by accessing global state. Secondly, we use `arg` to cache bound C functions. This turns out not to be correct, and collisions are resulting in incorrect traces.

In both cases, we can solve the problem by introducing a concept of ephemeral data which is used to materialize a cached value, but is not part of the cache key. (And the author is responsible for making sure that is done correctly.)

Differential Revision: [D38062921](https://our.internmc.facebook.com/intern/diff/D38062921/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81958
Approved by: https://github.com/ngimel
2022-07-29 05:12:09 +00:00
albanD
4b7de26556 Fix C API to be compatible with latest 3.11 beta (#81242)
Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes:
- Update pybind to the latest release as it contains some needed fixes
- Extend the compat header to do reduce changes in code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242
Approved by: https://github.com/malfet, https://github.com/mattip
2022-07-27 08:37:10 +00:00
Seonglyong Gong
72de816f5c GIL acquire needed in ValueCache::trimPrefixes (#81061)
Summary: Dubugged a segfault issue in Ondemand python tracing. Committing as a a separate diff from D37410204.

Test Plan:
- run a python test case with the following command for on-demand flow:
echo -e "PYTHON_STACK_TRACE=true" > /tmp/scott_kineto.conf && dyno gputrace --gputrace_duration 300ms --gpuconf /tmp/scott_kineto.conf

Reviewed By: chaekit

Differential Revision: D37662988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81061
Approved by: https://github.com/albanD
2022-07-19 01:00:36 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Taylor Robie
9f2e2aa28b Revert "Revert "[Profiler] Move python tracing to unified event type (Part 2)""
This reverts commit 4305f8e9bd.

replace TEST_CUDA with torch.has_cuda

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79173

Approved by: https://github.com/ezyang
2022-06-09 19:45:02 +00:00
PyTorch MergeBot
4305f8e9bd Revert "[Profiler] Move python tracing to unified event type (Part 2)"
This reverts commit c2a3c8186c.

Reverted https://github.com/pytorch/pytorch/pull/78164 on behalf of https://github.com/malfet due to Broke cuda-on-cpu tests, see c2a3c8186c
2022-06-08 02:21:16 +00:00
Taylor Robie
c2a3c8186c [Profiler] Move python tracing to unified event type (Part 2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78164

This PR finishes moving over the python tracer to use the unified event type. Things that changed:

1) The hacky after-the-fact splicing of python events in profiler_kineto.cpp is gone and python events now simply fold into the rest. (Yay!!!) This is a major BE win.
2) Added `ExtraFields<EventType::PyCall>` and `ExtraFields<EventType::PyCCall>`
3) The enter events (time + TraceKey) are now handled by RecordQueue for performance.
4) Python tracing now uses TSC for lower overhead.

Simplifications in profiler_python WRT part 1:
1) Rather than ValueCache emitting an intermediate value_t that gets further converted, load methods can now directly emit ExtraFields<...>
2) The complicated replay in profiler_python.cpp is replaced with a much simpler (and safer) pass to just pair start and end times.
3) During post processing we can now use `CallTypeHelper::map` to automatically pull in all events instead of having to loop over each the entries for each type manually. This will make it simpler to add new types of Python event later.

Differential Revision: [D36515869](https://our.internmc.facebook.com/intern/diff/D36515869/)

Approved by: https://github.com/aaronenyeshi
2022-06-07 23:42:00 +00:00
Taylor Robie
a173613f6d [Profiler] Move python tracing to unified event type (Part 1)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78163

The python function tracer is complicated and separate from the other profile types, so I've chosen to break the change into two diff. The first (this one) reworks the cache structure to make it amenable to integration (as well as some other nice tweaks) and the next one actually moves it over.

The old cache scheme worked very hard to pack all the information about an event into a small struct via bit packing, with a couple secondary caches for things like names. Because of the space constraints on that struct (and the fact that it had to represent all call and return types) there were a lot of subtle invariants swirling around that made it hard to offload anything to a different component. The new cache system is more modular and also, as it turns out, a bit faster. (Benchmarks in part 2)

There is a more detailed description of the cache hierarchy in the PR, but the gist is that I use various specializations to handle the different event types (python call, nn module, c function) and lean on the type system to keep everything safe and organized. (One nice thing about using unique IDs is that they also implicitly encode the event type. They implicitly encode everything!) Given that we are going to want to expand the semantics (e.g. torch ops, DataLoader, etc) this will give a nice way to capture richer semantics without significantly increasing the complexity of the profiler.

Differential Revision: [D36379147](https://our.internmc.facebook.com/intern/diff/D36379147/)

Approved by: https://github.com/aaronenyeshi
2022-06-07 23:42:00 +00:00
Taylor Robie
e0a071a47e [Profiler] Abstract interface for Python tracer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77699

The current machinery to connect libtorch to libtorch_python for profiling is... meh. Adequite for separate components that mostly just need to send a trigger, but not really clean. This PR makes an abstract interface class that the python tracer subclasses so the profiler can actually get at the tracer singleton, albeit through a restricted interface. This will help fold Python tracing into the new unified event structure.

Differential Revision: [D36325739](https://our.internmc.facebook.com/intern/diff/D36325739/)

Approved by: https://github.com/aaronenyeshi
2022-05-25 16:11:01 +00:00
Taylor Robie
7b8cf1f736 [pytorch][PR] [Profiler][Trivial] Format profiler_python.cpp
There are some unfortunate style issues, like four space indents and various other minor issues. There is a pretty big overhaul coming to the python tracer, so I want to be able to commit them with more style compliant code.

Differential Revision: [D36070201](https://our.internmc.facebook.com/intern/diff/D36070201/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77692

Approved by: https://github.com/aaronenyeshi
2022-05-18 03:52:19 +00:00
Amir Khojaste
748790588c Upgrading the loop to use irange (#70326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326

See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.

Test Plan: buck run //caffe2/torch/fb/sparsenn:test

Reviewed By: r-barnes

Differential Revision: D33243400

fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
2022-01-06 07:06:53 -08:00
Taylor Robie
33e9a0b5f6 [Reland] Python tracer. (#68325)
Summary:
There were two issues with the original PR:
1) My assumption that bound C functions could be trusted to stay alive was not valid. I'm still not entirely sure what was dying, but I've just added a cache so that the first time I see a function I collect the repr just like I was already doing with Python functions.

2) `std::regex` is known to be badly broken and prone to segfaults. Because I'm just doing a very simple prefix prune it's fine to do it manually; see `trimPrefix`. Long term we should move all of PyTorch to `re2` as the internal lint suggests, but CMake is hard and I couldn't get it to work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68325

Reviewed By: chaekit

Differential Revision: D32432596

Pulled By: robieta

fbshipit-source-id: 06fb4bcdc6933a3e76f6021ca69dc77a467e4b2e
2021-11-15 23:32:49 -08:00
Jane Xu
8bf150f21b Revert D32178667: [pytorch][PR] Python tracer for profiler
Test Plan: revert-hammer

Differential Revision:
D32178667 (33353fb828)

Original commit changeset: 118547104a7d

fbshipit-source-id: 47510607589fc39c730ba913f47c01a7d107b7b0
2021-11-12 14:53:52 -08:00
Taylor Robie
33353fb828 Python tracer for profiler (#67407)
Summary:
This PR instruments the CPython interpreter and integrates the resulting trace into the PyTorch profiler.

The python tracing logic works by enabling `PyEval_SetProfile`, and then logging the minimal information to track every time python calls or returns from a function. A great deal of care has gone into keeping this process very lightweight; the `RawEvent` struct is only two words and doesn't do anything fancy. When a python function is called, we have to do extra work. If the call is to `nn.Module.__call__`, we simply incref to extend the life of the module. Otherwise we check if we have seen the function before, and if not go through the (somewhat expensive) task of saving the strings which we then cache.

To actually get a useful timeline, we have to replay the events to determine the state of the python stack at any given point. A second round of stack replay is needed to figure out what the last python function was for each torch op so we can reconstruct the correct python stack. All of this is done during post processing, so while we want to be reasonably performant it is no longer imperative to shave every last bit.

I still need to do a bit of refinement (particularly where the tracer interfaces with the profiler), but this should give a good sense of the general structure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67407

Test Plan:
```
import torch

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = self.linear(x)
        return self.relu(x)

def call_module():
    m = MyModule()
    for _ in range(4):
        m(torch.ones((2, 2)))

def top_level_fn():
    with torch.profiler.profile(with_stack=True) as p:
        call_module()

    p.export_chrome_trace("test_trace.json")

top_level_fn()
```
<img width="1043" alt="Screen Shot 2021-10-27 at 6 43 18 PM" src="https://user-images.githubusercontent.com/13089297/139171803-f95e70f3-24aa-45e6-9d4b-6d437a3f108d.png">

PS: I've tried to comment liberally, particularly around some of the more magical parts. However I do plan on doing another linting and commenting pass. Hopefully it's not too bad right now.

Reviewed By: gdankel, chaekit

Differential Revision: D32178667

Pulled By: robieta

fbshipit-source-id: 118547104a7d887e830f17b94d3a29ee4f8c482f
2021-11-12 11:58:12 -08:00