pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	3d88c618d5	Concat namespaces in torch/csrc/profiler and other fixes (#127266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127266 Approved by: https://github.com/soulitzer	2024-05-28 15:21:32 +00:00
PyTorch MergeBot	8eb579e362	Revert "[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 )" This reverts commit `157a3d2a7c`. Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev.	2022-10-14 14:56:59 +00:00
Taylor Robie	157a3d2a7c	[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 ) The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation. As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`. One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure. (Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.) Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:48 +00:00
Taylor Robie	b8f14b7877	[Profiler][Minor] Group and consolidate stub APIs (#85510 ) There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler. Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory. Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:46 +00:00
Taylor Robie	bea0184033	Reland: [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 )" (#84667 ) Reland of https://github.com/pytorch/pytorch/pull/83893 Differential Revision: [D39282536](https://our.internmc.facebook.com/intern/diff/D39282536/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39282536/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84667 Approved by: https://github.com/slgong-fb	2022-09-08 17:09:19 +00:00
PyTorch MergeBot	8b578849b4	Revert "[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 )" This reverts commit `48a596ad3f`. Reverted https://github.com/pytorch/pytorch/pull/83893 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-09-01 18:34:58 +00:00
Taylor Robie	48a596ad3f	[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 ) Just a basic move. Later I'll add other subsystems. (Python, Kineto) Differential Revision: [D38925895](https://our.internmc.facebook.com/intern/diff/D38925895/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38925895/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83893 Approved by: https://github.com/slgong-fb	2022-08-30 21:40:59 +00:00
Taylor Robie	c26b53f6a4	[Profiler] Encapsulate callback handle management. (#83892 ) Right now the profiler is capible of leaking callback handles if a client does not call `at::removeCallback`. (As well as a double free if two clients handle it.) This modestly improves the situation by pulling removal into a single method and calling that removal code in the dtor unless explicitly opted out. Once we deprecate the legacy profiler we can further simplify by making the ProfilerThreadLocalStateBase own the handle outright. Differential Revision: [D38920537](https://our.internmc.facebook.com/intern/diff/D38920537/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83892 Approved by: https://github.com/slgong-fb	2022-08-30 21:40:58 +00:00
Taylor Robie	7480e83338	[Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891 ) `ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts. Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891 Approved by: https://github.com/slgong-fb	2022-08-29 08:56:54 +00:00
PyTorch MergeBot	261be8e5c2	Revert "[Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891 )" This reverts commit `69e9f905b7`. Reverted https://github.com/pytorch/pytorch/pull/83891 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 18:30:05 +00:00
Taylor Robie	69e9f905b7	[Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891 ) `ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts. Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891 Approved by: https://github.com/slgong-fb	2022-08-26 22:51:10 +00:00
Taylor Robie	f4dc7b3a8a	[Profiler][Trivial] Cleanup ExperimentalConfig (#83890 ) I'm trying to limit how much is in headers to make it easier to read the API surface. In a similar vein, we can replace `hasOptions` with `operator bool` so it just does the right thing in the check. Differential Revision: [D38917366](https://our.internmc.facebook.com/intern/diff/D38917366/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83890 Approved by: https://github.com/slgong-fb	2022-08-26 22:51:09 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Jay Chae	cd7895e64f	[kineto] global callback support in ProfilerKineto (#76078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76078 templatize `pushProfilingCallbacks` to support `RecordFunction` global callback support. The reason for templatizing is to 1. squeeze out performance on hot path 2. work around the capture-less lambdas Test Plan: ## Global Callback These were tested in conjunction with e2e subsequent diffs in both `trace_tester` and `sigrid` sample trace: https://fburl.com/perfdoctor/tzgtw2ln ## Local Callback https://fburl.com/perfdoctor/l58nfiyp Reviewed By: robieta Differential Revision: D35457300 fbshipit-source-id: 9d587ec68bfd405e565cc8956b0afa2cdaf95b94 (cherry picked from commit 9d8a9063d7525972d5364307c95ed50f6bafe3ec)	2022-04-22 06:42:39 -07:00
Brian Coutinho	8385e06b0b	[pytorch][cupti profiler 6/n] Changes to configure Kineto cupti profiler from pytorch profiler interface (#75616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75616 Kineto introduced a new profiler to read performance counters from NVIDIA GPUs (CUPTI Range Profiler API) Here we are adding support to configure this Kineto range profiler mode Example ``` with torch.profiler.profile( activities=[ProfilerActivity.CUDA], record_shapes=True, on_trace_ready=trace_handler, experimental_config=torch.profiler._ExperimentalConfig( profiler_metrics=[ "kineto__tensor_core_insts", "dram__bytes_read.sum", "dram__bytes_write.sum"], profiler_measure_per_kernel=False), ) as prof: res = train_batch(modeldef) prof.step() ``` ## Details * Introduce a new structure `KinetoProfilerConfig` so users can configure Kineto specific options, keeps profiler API consistent. * Populate configuration options for Kineto. Test Plan: CI and tested on resnet50 Reviewed By: robieta Differential Revision: D34489487 fbshipit-source-id: 8ef82d2593f4f4d5824ca634f7d25507bc572caa (cherry picked from commit 4a2af70629db55a605d4b8d0a54d41df2b247183)	2022-04-20 22:19:54 +00:00
Taylor Robie	46817895bd	[Profiler] Split observer implementations based on ProfilerState (#71135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71135 The NVTX profiler is quite different from the other Kineto cases, so it's worth it to peel it off early so that later logic can assume either KINETO or KINETO_GPU_FALLBACK. This is more important since we're going to change the Kineto internals. (You can see the python tracer was unnecessarily coupled to NVTX just because the control logic was intermingled.) There's also no reason to put the legacy observer state in the header rather than the cpp file now that the kineto profiler doesn't need it, so we should shield it from prying eyes. The recent headaches with TLS downcasting and RPC integration (D32678163 (`7ea86dfdb1`), D33283314 (`681e78bace`), D33437773 (`7d6535cab3`)) have made crystal clear that we need a lot more safety in the profiler, particularly as we shift things around. Test Plan: Unit tests. This is no longer a performance PR. Reviewed By: aaronenyeshi Differential Revision: D32710829 fbshipit-source-id: f9138598b3cfeba71872905a7afab3c03c0d56e7 (cherry picked from commit `059a39d8e3`)	2022-01-26 18:33:24 +00:00
Peter Bell	fa09099ba3	Codegen: TraceType only includes operators being registered (#68691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691 TraceType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also changes all the included autograd and jit headers from including `ATen/ATen.h` to just including `ATen/core/Tensor.h`. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D33336948 Pulled By: albanD fbshipit-source-id: 4e40371592b9a5a7e7fcd1d8cecae11ffb873113	2022-01-02 13:09:19 -08:00
Taylor Robie	681e78bace	[Profiler] Address issues from profiler bifurcation. (#70327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70327 After D32678163 (`7ea86dfdb1`), test_rpc_profiler began failing. This was surprising, because it should have been a no-op refactor. However, one change is that a Kineto profiler is no longer also an autograd profiler; the RPC framework was assuming a legacy profiler but when a kineto profiler was active things still kind of worked due to that implementation detail. (But crashed after the class split.) This diff tidys up a couple of things: 1) Move `getProfilerConfig` into `api.cpp`, since it is no longer correct to static_cast a `KinetoThreadLocalState` to a `ProfilerLegacyThreadLocalState`. (And really the class we want is `ProfilerThreadLocalStateBase` anyway.) 2) Add a mechanism for callers to check if the active profiler is a legacy or kineto profiler. (So callers like RPC can adjust or provide a nice error message.) 3) Fix the RPC test to create a legacy profiler. Test Plan: `caffe2/torch/fb/training_toolkit/backend/tests:test_rpc_profiler` now passes, and before the fix to `test_rpc_profiler.py`, I verified that the test failed with the error message added to `utils.cpp` rather than just crashing. Reviewed By: suphoff Differential Revision: D33283314 fbshipit-source-id: e4fc5b5cfc9ca3b91b8f5e09adea36f38611f90d	2021-12-22 18:50:42 -08:00
Taylor Robie	7ea86dfdb1	[Profiler] Factor common logic into `torch/csrc/profiler/api.h` (#69459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459 This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler: 1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic. 2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`. Test Plan: Ran the overhead ubenchmark. Reviewed By: aaronenyeshi Differential Revision: D32678163 fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c	2021-12-19 18:40:28 -08:00

22 Commits