pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Tristan Rice 758d7dea9c torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a		2021-11-18 21:46:23 -08:00
..
_C	[c10d] Fix object-based collectives for debug mode (#68223 )	2021-11-13 04:18:31 -08:00
_masked	Strided masked reduction: mean (2nd try) (#67088 )	2021-11-01 16:12:07 -07:00
ao	[quant][embedding qat] eager mode QAT for Embeddings (#66429 )	2021-11-18 05:57:11 -08:00
autograd	Stop warning spamming about vmap in gradcheck (#68586 )	2021-11-18 07:00:36 -08:00
backends	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 )	2021-11-09 17:27:20 -08:00
contrib
cpu	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 )	2021-10-27 12:11:36 -07:00
csrc	torch.monitor - Initial C++ Stats (#68074 )	2021-11-18 21:46:23 -08:00
cuda	Update __init__.py (#67900 )	2021-11-08 08:56:38 -08:00
distributed	[reland] simplify init_from_local_shards API (#68021 )	2021-11-17 23:20:37 -08:00
distributions	Implement Entropy methods for Binomial and Multinomial distributions (#67609 )	2021-11-11 09:16:28 -08:00
fft
for_onnx
futures
fx	[const_fold] Fix call_module const folding (#68614 )	2021-11-18 20:56:06 -08:00
jit	Update Freezing Logic and add new passes (#68024 )	2021-11-09 13:21:52 -08:00
legacy
lib	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 )	2021-11-03 12:23:14 -07:00
linalg	Revert D32283178: Add linalg.solve_triangular	2021-11-18 14:46:10 -08:00
multiprocessing
nn	[BC-breaking] Change dtype of softmax to support TorchScript and MyPy (#68336 )	2021-11-18 11:26:14 -08:00
onnx	Added antialias flag to interpolate (CPU only, bilinear) (#65142 )	2021-11-17 09:10:15 -08:00
optim	Adds an optimizer instance variable to ChainedScheduler (#68010 )	2021-11-10 01:31:47 -08:00
package
profiler	[Reland] Python tracer. (#68325 )	2021-11-15 23:32:49 -08:00
quantization
sparse
special
testing	Add native_dropout (#63937 )	2021-11-18 19:41:10 -08:00
utils	Fix DLPack CUDA stream convention (#67618 )	2021-11-18 08:36:05 -08:00
__config__.py
__future__.py
__init__.py	Add `set_deterministic_debug_mode` and `get_deterministic_debug_mode` (#67778 )	2021-11-11 12:48:29 -08:00
_appdirs.py
_classes.py
_deploy.py	[deploy] fix TypedStorage serialization (#67499 )	2021-10-28 22:33:04 -07:00
_jit_internal.py	[package] fix torchscript classes in package (#68028 )	2021-11-16 10:01:40 -08:00
_linalg_utils.py
_lobpcg.py	`torch.lobpcg.backward`: do not save non-Variable types with `ctx.save_for_backward`. (#67994 )	2021-11-08 10:02:09 -08:00
_lowrank.py
_namedtensor_internals.py
_ops.py
_python_dispatcher.py
_six.py
_sources.py	Disallow annotations on instance attributes outside __init__ (#67051 )	2021-10-25 16:20:47 -07:00
_storage_docs.py
_tensor_docs.py	[numpy] Alias `arctan2` to `atan2` (#67010 )	2021-11-16 09:41:09 -08:00
_tensor_str.py
_tensor.py	Sparse CSR: add `convert_indices_from_csr_to_coo` (#66774 )	2021-11-17 22:28:30 -08:00
_torch_docs.py	Revert D32283178: Add linalg.solve_triangular	2021-11-18 14:46:10 -08:00
_utils_internal.py
_utils.py
_VF.py
_vmap_internals.py	More aggressively market functorch.vmap when torch.vmap gets called (#67347 )	2021-11-12 16:10:16 -08:00
abi-check.cpp
autocast_mode.py	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 )	2021-10-27 12:11:36 -07:00
CMakeLists.txt	codegen: Split up source, header and Declarations.yaml generation (#67497 )	2021-11-03 13:20:54 -07:00
custom_class_detail.h	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 )	2021-11-03 12:23:14 -07:00
custom_class.h	[NOOP][clangformat][codemod] Enable CLANGFORMAT (#67854 )	2021-11-04 14:07:57 -07:00
deploy.h
extension.h
functional.py	[lint] small pass to make lint clean (#68367 )	2021-11-16 10:27:00 -08:00
hub.py	making import_module private and deprecating public method (#67990 )	2021-11-09 07:27:57 -08:00
library.h	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 )	2021-11-03 12:23:14 -07:00
overrides.py	Add native_dropout (#63937 )	2021-11-18 19:41:10 -08:00
py.typed
quasirandom.py
random.py
README.txt
script.h	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 )	2021-11-03 12:23:14 -07:00
serialization.py	Throw error when saving storages that view same data with different type (#66949 )	2021-11-16 08:44:44 -08:00
storage.py
torch_version.py
types.py

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.