Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19.
The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller.
The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948
Approved by: https://github.com/Skylion007
Summary:
fix forward for S477887
leaking c++ singleton specifically
when c++ shutdown, it tries to destruct the singleton and acquire GIL, at this moment python runtime exists already, causing undefined behavior.
Leaking here specifically so that we won't try to destroy singleton at the shutdown phase
Test Plan: n/a
Differential Revision: D67400633
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143509
Approved by: https://github.com/c-p-i-o
Summary:
Basic pybind integration for WaitCounter providing a guard API.
Also fixes broken copy/move constructor in WaitGuard (it wasn't really used with the macro-based C++ API).
Test Plan: unit test
Reviewed By: asiab4
Differential Revision: D60463979
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132167
Approved by: https://github.com/asiab4
Summary: Since WaitCounter frontend itself has minimal depdendencies it's fine to be moved into c10. Specific backends can be registered/linked separately.
Test Plan: unit test
Reviewed By: jamesperng, asiab4, c-p-i-o
Differential Revision: D59842868
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131021
Approved by: https://github.com/asiab4
Summary: Since WaitCounter frontend itself has minimal depdendencies it's fine to be moved into c10. Specific backends can be registered/linked separately.
Test Plan: unit test
Reviewed By: jamesperng, asiab4, c-p-i-o
Differential Revision: D59842868
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131021
Approved by: https://github.com/asiab4
Summary:
This diff introduces a much more flexible model for WaitCounter backend:
1. Backend can be installed dynamically (even if not linked with pytorch) instead of relying on macros and swapping implementation at compile time
2. Multiple backends are supported at the same time.
Test Plan: unit test
Reviewed By: jamesperng
Differential Revision: D59795863
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130934
Approved by: https://github.com/asiab4
Summary:
This diff does a minor cleanup of WaitCounters:
1. Fixes some singleton use to ensure one instance of WaitCounterImpl per counter per process
2. Updates API to enable measuring duration of individual wait operations
Test Plan: unit test
Differential Revision: D59709324
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130664
Approved by: https://github.com/c-p-i-o, https://github.com/asiab4
* created fb internal implementation in `caffe2/torch/csrc/monitor/fb/instrumentation.cpp`
* uses `facebook::data_preproc::WaitCounterUs` under the hood by having `WaitCounterImpl` trivially subclass it.
* this makes `WaitCounterHandle` a glorified pointer to `facebook::data_preproc::WaitCounterUs` which is statically defined in the `STATIC_WAIT_COUNTER` macro making these pointers Meyer's singletons.
* `facebook::data_preproc::WaitCounterUs` uses 3 singletons:
1. `std::unique_ptr<DynamicCounter::State>` map — leaky singleton
2. `std::weak_ptr<WaitCounterUs::State>` map — leaky singleton
3. publisherSingleton — normal singleton since it manages resources (threads)
* `facebook::data_preproc::WaitCounterUs` actually owns shared pointers to the state and its destructor will remove it from the `std::weak_ptr<WaitCounterUs::State>` map when the reference count for the state hits 0.
* linked `caffe2/torch/csrc/monitor/fb/instrumentation.cpp` and added `//data_preproc/common:counters` (dpp dependency) to `caffe2/fb/fbcode/target_definitions.bzl`
* wrapped OSS null implementation in `#ifndef FBCODE_CAFFE2` so that internally we use the fb internal implementation.
as a follow-up I might move the counter implementation out of the data_preproc/counters library to a more common ai infra library?
Differential Revision: [D58458751](https://our.internmc.facebook.com/intern/diff/D58458751/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128605
Approved by: https://github.com/c-p-i-o
ghstack dependencies: #128466
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009
This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth.
Test Plan:
```
buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
```
Reviewed By: kiukchung
Differential Revision: D33822956
fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4
(cherry picked from commit 293b94e0b4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567
This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation.
* The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class.
* This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector.
* Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises.
```
events = []
def handler(event):
events.append(event)
handle = register_event_handler(handler)
log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0}))
```
D32969391 is now included in this diff.
This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data.
Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
Reviewed By: kiukchung
Differential Revision: D32924141
fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923
Original commit changeset: fbaf2cc06ad4
Original Phabricator Diff: D32606547 (e61fc1c03b)
This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck.
Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783
Test Plan:
buck test //caffe2/test/cpp/monitor:monitor
will add ciflow tags to ensure mac builds are fine
Reviewed By: aivanou
Differential Revision: D33102715
fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb
Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.
This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783
Test Plan: buck test //caffe2/test/cpp/monitor:monitor
Reviewed By: kiukchung
Differential Revision: D32606547
fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074
This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30
This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.
This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.
Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).
Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.
Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view
Test Plan:
buck test //caffe2/test/cpp/monitor:monitor
CI
Reviewed By: kiukchung
Differential Revision: D32266032
fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a