Commit Graph

12 Commits

Author SHA1 Message Date
cyy
53e356a1c0 [2/N] Enable cppcoreguidelines-special-member-functions (#138670)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138670
Approved by: https://github.com/sraikund16
2024-10-24 04:35:18 +00:00
cyy
3d88c618d5 Concat namespaces in torch/csrc/profiler and other fixes (#127266)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127266
Approved by: https://github.com/soulitzer
2024-05-28 15:21:32 +00:00
cyy
ff82dcd8fa [2/N] Enable clang-tidy checks in torch/csrc/profiler (#113439)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113439
Approved by: https://github.com/Skylion007
2023-11-14 00:39:54 +00:00
cyy
add78ac425 Fix a type error in AppendOnlyList (#112362)
AppendOnlyList::emplace_back allocates an array and overwrites the first slot, which is unsafe on a non-trivial type. This PR fixes it and add other checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112362
Approved by: https://github.com/aaronenyeshi
2023-11-04 07:06:42 +00:00
Taylor Robie
dfdfaec3fc [Profiler] Don't assign in AppendOnlyList::emplace_back (#85716)
It turns out that we're invoking the copy assign operator in AppendOnlyList. While copy elision is expected to mostly hide any costs it does present issues for types with deleted copy assign operators. (It also seems to produce slightly worse assembly: https://godbolt.org/z/o4Gvz1fKs)

Calling new at the correct position seems to be a better way to go about this. (At least from looking at other high performance containers like SmallVector.)

Differential Revision: [D39852804](https://our.internmc.facebook.com/intern/diff/D39852804/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85716
Approved by: https://github.com/chaekit
2022-09-29 02:59:43 +00:00
Taylor Robie
61b9d8fccd [Profiler][Trivial] Add null handling to AppendOnlyList::copy memcpy path. (#83963)
It is apparently undefined behavior to do pointer arithmetic on nullptr. In the case of AppendOnlyList, `next_` will only be null if `end_` is also null and thus the `memcpy` path will only be triggered if `n == 0`. Nonetheless, it is UB to `memcpy(0, 0, 0)`

The extra null check is in a `C10_LIKELY` block so the extra cost should be negligible, and indeed after dusting off the component microbenchmarks there's no observable difference.

Differential Revision: [D38969443](https://our.internmc.facebook.com/intern/diff/D38969443/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83963
Approved by: https://github.com/slgong-fb
2022-08-26 20:03:24 +00:00
John Clow
343b5f8651 [TorchTidy] Adding support for accessing strides and scalars (#80072)
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80072
Approved by: https://github.com/robieta
2022-08-17 03:36:00 +00:00
David Chen
f754d2501d Use custom AppendOnlyList for op_events to reduce the number of atomic operations (#78643)
Summary:
- Use atomic counter in Block storage constructor and offset within the block to calculate correlation_id.
- Implicitly deduce correlation id, no need to store it when profiling.

Test Plan:
Added test_profiler_correlation_id() in test_profiler.py to check the uniqueness of correlation id.

To run the test:
python test_profiler.py

Differential Revision: D36793803

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78643
Approved by: https://github.com/robieta
2022-06-14 03:17:09 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Taylor Robie
580e6583d5 [Profiler] Fix segfault in AppendOnlyList
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77997

`buffer_last_` is supposed to start at buffer_.before_begin(). It is correctly set in the ctor, but incorrectly set in `clear()`. This causes a segfault in `maybe_grow()` (Specifically, `buffer_.emplace_after(buffer_last_)`) for an AppendOnlyList which has been cleared.

Differential Revision: [D36555737](https://our.internmc.facebook.com/intern/diff/D36555737/)

Approved by: https://github.com/aaronenyeshi
2022-05-21 02:38:26 +00:00
Taylor Robie
2ecf743757 [Profiler] Pay for what you use (v2) (#74484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74484

In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue.

This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant.

Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit.

Reviewed By: swolchok

Differential Revision: D34779994

fbshipit-source-id: 392bc7c6f12797fa5e18777063aa21210d9d2067
(cherry picked from commit f0a49ff7be8aa65bab2f6952cc2e6306c1edc24b)
2022-03-24 18:43:08 +00:00
Taylor Robie
5a58820f01 [Profiler] Specialized AppendOnlyQueue (#73409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409

We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)

Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).

Reviewed By: swolchok

Differential Revision: D34231072

fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)
2022-03-11 19:47:40 +00:00