Commit Graph

22 Commits

Author SHA1 Message Date
cyy
3d88c618d5 Concat namespaces in torch/csrc/profiler and other fixes (#127266)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127266
Approved by: https://github.com/soulitzer
2024-05-28 15:21:32 +00:00
PyTorch MergeBot
8eb579e362 Revert "[Profiler] Move legacy profiler out of torch/csrc/autograd (#85512)"
This reverts commit 157a3d2a7c.

Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev.
2022-10-14 14:56:59 +00:00
Taylor Robie
157a3d2a7c [Profiler] Move legacy profiler out of torch/csrc/autograd (#85512)
The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation.

As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`.

One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure.

(Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.)

Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512
Approved by: https://github.com/aaronenyeshi
2022-10-14 05:38:48 +00:00
Taylor Robie
b8f14b7877 [Profiler][Minor] Group and consolidate stub APIs (#85510)
There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler.

Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory.

Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510
Approved by: https://github.com/aaronenyeshi
2022-10-14 05:38:46 +00:00
Taylor Robie
bea0184033 Reland: [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)" (#84667)
Reland of https://github.com/pytorch/pytorch/pull/83893

Differential Revision: [D39282536](https://our.internmc.facebook.com/intern/diff/D39282536/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39282536/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84667
Approved by: https://github.com/slgong-fb
2022-09-08 17:09:19 +00:00
PyTorch MergeBot
8b578849b4 Revert "[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)"
This reverts commit 48a596ad3f.

Reverted https://github.com/pytorch/pytorch/pull/83893 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-09-01 18:34:58 +00:00
Taylor Robie
48a596ad3f [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)
Just a basic move. Later I'll add other subsystems. (Python, Kineto)

Differential Revision: [D38925895](https://our.internmc.facebook.com/intern/diff/D38925895/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38925895/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83893
Approved by: https://github.com/slgong-fb
2022-08-30 21:40:59 +00:00
Taylor Robie
c26b53f6a4 [Profiler] Encapsulate callback handle management. (#83892)
Right now the profiler is capible of leaking callback handles if a client does not call `at::removeCallback`. (As well as a double free if two clients handle it.) This modestly improves the situation by pulling removal into a single method and calling that removal code in the dtor unless explicitly opted out. Once we deprecate the legacy profiler we can further simplify by making the ProfilerThreadLocalStateBase own the handle outright.

Differential Revision: [D38920537](https://our.internmc.facebook.com/intern/diff/D38920537/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83892
Approved by: https://github.com/slgong-fb
2022-08-30 21:40:58 +00:00
Taylor Robie
7480e83338 [Profiler] Add disabled and global methods to ProfilerConfig. (#83891)
`ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts.

Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891
Approved by: https://github.com/slgong-fb
2022-08-29 08:56:54 +00:00
PyTorch MergeBot
261be8e5c2 Revert "[Profiler] Add disabled and global methods to ProfilerConfig. (#83891)"
This reverts commit 69e9f905b7.

Reverted https://github.com/pytorch/pytorch/pull/83891 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 18:30:05 +00:00
Taylor Robie
69e9f905b7 [Profiler] Add disabled and global methods to ProfilerConfig. (#83891)
`ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts.

Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891
Approved by: https://github.com/slgong-fb
2022-08-26 22:51:10 +00:00
Taylor Robie
f4dc7b3a8a [Profiler][Trivial] Cleanup ExperimentalConfig (#83890)
I'm trying to limit how much is in headers to make it easier to read the API surface. In a similar vein, we can replace `hasOptions` with `operator bool` so it just does the right thing in the check.

Differential Revision: [D38917366](https://our.internmc.facebook.com/intern/diff/D38917366/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83890
Approved by: https://github.com/slgong-fb
2022-08-26 22:51:09 +00:00
Jing Xu
3c7044728b Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-07-13 13:50:15 +00:00
PyTorch MergeBot
1454515253 Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)"
This reverts commit f988aa2b3f.

Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see f988aa2b3f
2022-06-30 12:49:41 +00:00
Jing Xu
f988aa2b3f Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-06-30 05:14:03 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Jay Chae
cd7895e64f [kineto] global callback support in ProfilerKineto (#76078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76078

templatize `pushProfilingCallbacks` to support `RecordFunction` global callback support. The reason for templatizing is to
1. squeeze out performance on hot path
2. work around the capture-less lambdas

Test Plan:
## Global Callback
These were tested in conjunction with e2e subsequent diffs in both `trace_tester` and `sigrid`
sample trace: https://fburl.com/perfdoctor/tzgtw2ln

## Local Callback
https://fburl.com/perfdoctor/l58nfiyp

Reviewed By: robieta

Differential Revision: D35457300

fbshipit-source-id: 9d587ec68bfd405e565cc8956b0afa2cdaf95b94
(cherry picked from commit 9d8a9063d7525972d5364307c95ed50f6bafe3ec)
2022-04-22 06:42:39 -07:00
Brian Coutinho
8385e06b0b [pytorch][cupti profiler 6/n] Changes to configure Kineto cupti profiler from pytorch profiler interface (#75616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75616

Kineto introduced a new profiler to read performance counters from NVIDIA GPUs (CUPTI Range Profiler API)
Here we are adding support to configure this Kineto range profiler mode

Example
```
    with torch.profiler.profile(
        activities=[ProfilerActivity.CUDA],
        record_shapes=True,
        on_trace_ready=trace_handler,
        experimental_config=torch.profiler._ExperimentalConfig(
            profiler_metrics=[
                "kineto__tensor_core_insts",
                "dram__bytes_read.sum",
                "dram__bytes_write.sum"],
            profiler_measure_per_kernel=False),
    ) as prof:
        res = train_batch(modeldef)
        prof.step()
```

## Details
* Introduce a new structure `KinetoProfilerConfig` so users can configure Kineto specific options, keeps profiler API consistent.
* Populate configuration options for Kineto.

Test Plan: CI and tested on resnet50

Reviewed By: robieta

Differential Revision: D34489487

fbshipit-source-id: 8ef82d2593f4f4d5824ca634f7d25507bc572caa
(cherry picked from commit 4a2af70629db55a605d4b8d0a54d41df2b247183)
2022-04-20 22:19:54 +00:00
Taylor Robie
46817895bd [Profiler] Split observer implementations based on ProfilerState (#71135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71135

The NVTX profiler is quite different from the other Kineto cases, so it's worth it to peel it off early so that later logic can assume either KINETO or KINETO_GPU_FALLBACK. This is more important since we're going to change the Kineto internals. (You can see the python tracer was unnecessarily coupled to NVTX just because the control logic was intermingled.)

There's also no reason to put the legacy observer state in the header rather than the cpp file now that the kineto profiler doesn't need it, so we should shield it from prying eyes.

The recent headaches with TLS downcasting and RPC integration (D32678163 (7ea86dfdb1), D33283314 (681e78bace), D33437773 (7d6535cab3)) have made crystal clear that we need a lot more safety in the profiler, particularly as we shift things around.

Test Plan: Unit tests. This is no longer a performance PR.

Reviewed By: aaronenyeshi

Differential Revision: D32710829

fbshipit-source-id: f9138598b3cfeba71872905a7afab3c03c0d56e7
(cherry picked from commit 059a39d8e3)
2022-01-26 18:33:24 +00:00
Peter Bell
fa09099ba3 Codegen: TraceType only includes operators being registered (#68691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691

TraceType is a sharded file, so by only including specific operator
headers, we ensure that changing one (non-method) operator only needs
one shard to be re-compiled.

This also changes all the included autograd and jit headers from
including `ATen/ATen.h` to just including `ATen/core/Tensor.h`.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D33336948

Pulled By: albanD

fbshipit-source-id: 4e40371592b9a5a7e7fcd1d8cecae11ffb873113
2022-01-02 13:09:19 -08:00
Taylor Robie
681e78bace [Profiler] Address issues from profiler bifurcation. (#70327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70327

After D32678163 (7ea86dfdb1), test_rpc_profiler began failing. This was surprising, because it should have been a no-op refactor. However, one change is that a Kineto profiler is no longer also an autograd profiler; the RPC framework was assuming a legacy profiler but when a kineto profiler was active things still kind of worked due to that implementation detail. (But crashed after the class split.)

This diff tidys up a couple of things:
1) Move `getProfilerConfig` into `api.cpp`, since it is no longer correct to static_cast a `KinetoThreadLocalState` to a `ProfilerLegacyThreadLocalState`. (And really the class we want is `ProfilerThreadLocalStateBase` anyway.)

2) Add a mechanism for callers to check if the active profiler is a legacy or kineto profiler. (So callers like RPC can adjust or provide a nice error message.)

3) Fix the RPC test to create a legacy profiler.

Test Plan: `caffe2/torch/fb/training_toolkit/backend/tests:test_rpc_profiler` now passes, and before the fix to `test_rpc_profiler.py`, I verified that the test failed with the error message added to `utils.cpp` rather than just crashing.

Reviewed By: suphoff

Differential Revision: D33283314

fbshipit-source-id: e4fc5b5cfc9ca3b91b8f5e09adea36f38611f90d
2021-12-22 18:50:42 -08:00
Taylor Robie
7ea86dfdb1 [Profiler] Factor common logic into torch/csrc/profiler/api.h (#69459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459

This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler:

1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic.

2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`.

Test Plan: Ran the overhead ubenchmark.

Reviewed By: aaronenyeshi

Differential Revision: D32678163

fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c
2021-12-19 18:40:28 -08:00