pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	53e356a1c0	[2/N] Enable cppcoreguidelines-special-member-functions (#138670 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138670 Approved by: https://github.com/sraikund16	2024-10-24 04:35:18 +00:00
cyy	3d88c618d5	Concat namespaces in torch/csrc/profiler and other fixes (#127266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127266 Approved by: https://github.com/soulitzer	2024-05-28 15:21:32 +00:00
cyy	ff82dcd8fa	[2/N] Enable clang-tidy checks in torch/csrc/profiler (#113439 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113439 Approved by: https://github.com/Skylion007	2023-11-14 00:39:54 +00:00
cyy	add78ac425	Fix a type error in AppendOnlyList (#112362 ) AppendOnlyList::emplace_back allocates an array and overwrites the first slot, which is unsafe on a non-trivial type. This PR fixes it and add other checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112362 Approved by: https://github.com/aaronenyeshi	2023-11-04 07:06:42 +00:00
Taylor Robie	dfdfaec3fc	[Profiler] Don't assign in AppendOnlyList::emplace_back (#85716 ) It turns out that we're invoking the copy assign operator in AppendOnlyList. While copy elision is expected to mostly hide any costs it does present issues for types with deleted copy assign operators. (It also seems to produce slightly worse assembly: https://godbolt.org/z/o4Gvz1fKs) Calling new at the correct position seems to be a better way to go about this. (At least from looking at other high performance containers like SmallVector.) Differential Revision: [D39852804](https://our.internmc.facebook.com/intern/diff/D39852804/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85716 Approved by: https://github.com/chaekit	2022-09-29 02:59:43 +00:00
Taylor Robie	61b9d8fccd	[Profiler][Trivial] Add null handling to `AppendOnlyList::copy` memcpy path. (#83963 ) It is apparently undefined behavior to do pointer arithmetic on nullptr. In the case of AppendOnlyList, `next_` will only be null if `end_` is also null and thus the `memcpy` path will only be triggered if `n == 0`. Nonetheless, it is UB to `memcpy(0, 0, 0)` The extra null check is in a `C10_LIKELY` block so the extra cost should be negligible, and indeed after dusting off the component microbenchmarks there's no observable difference. Differential Revision: [D38969443](https://our.internmc.facebook.com/intern/diff/D38969443/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83963 Approved by: https://github.com/slgong-fb	2022-08-26 20:03:24 +00:00
John Clow	343b5f8651	[TorchTidy] Adding support for accessing strides and scalars (#80072 ) Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80072 Approved by: https://github.com/robieta	2022-08-17 03:36:00 +00:00
David Chen	f754d2501d	Use custom AppendOnlyList for op_events to reduce the number of atomic operations (#78643 ) Summary: - Use atomic counter in Block storage constructor and offset within the block to calculate correlation_id. - Implicitly deduce correlation id, no need to store it when profiling. Test Plan: Added test_profiler_correlation_id() in test_profiler.py to check the uniqueness of correlation id. To run the test: python test_profiler.py Differential Revision: D36793803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78643 Approved by: https://github.com/robieta	2022-06-14 03:17:09 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Taylor Robie	580e6583d5	[Profiler] Fix segfault in AppendOnlyList Pull Request resolved: https://github.com/pytorch/pytorch/pull/77997 `buffer_last_` is supposed to start at buffer_.before_begin(). It is correctly set in the ctor, but incorrectly set in `clear()`. This causes a segfault in `maybe_grow()` (Specifically, `buffer_.emplace_after(buffer_last_)`) for an AppendOnlyList which has been cleared. Differential Revision: [D36555737](https://our.internmc.facebook.com/intern/diff/D36555737/) Approved by: https://github.com/aaronenyeshi	2022-05-21 02:38:26 +00:00
Taylor Robie	2ecf743757	[Profiler] Pay for what you use (v2) (#74484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74484 In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue. This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant. Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit. Reviewed By: swolchok Differential Revision: D34779994 fbshipit-source-id: 392bc7c6f12797fa5e18777063aa21210d9d2067 (cherry picked from commit f0a49ff7be8aa65bab2f6952cc2e6306c1edc24b)	2022-03-24 18:43:08 +00:00
Taylor Robie	5a58820f01	[Profiler] Specialized AppendOnlyQueue (#73409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409 We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.) Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us). Reviewed By: swolchok Differential Revision: D34231072 fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1 (cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)	2022-03-11 19:47:40 +00:00

12 Commits