Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.
This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783
Test Plan: buck test //caffe2/test/cpp/monitor:monitor
Reviewed By: kiukchung
Differential Revision: D32606547
fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074
This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30
This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.
This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.
Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).
Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.
Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view
Test Plan:
buck test //caffe2/test/cpp/monitor:monitor
CI
Reviewed By: kiukchung
Differential Revision: D32266032
fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a