pytorch/torch/csrc/autograd
Taylor Robie 9f541aa3ac [Profiler] Optimize reportMemoryUsage (#71538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71538

`reportMemoryUsage` is kind of awful. It does a bunch of string writes and such that makes it VERY expensive. Just moving that work off the hot path reduces the overhead for `profile_memory` from ~6.5 us to ~1.2 us. (85% reduction in the kineto contribution to profiling overhead.)

Test Plan: Ran ubenchmark with `--op empty --stressTestKineto --kinetoProfileMemory`

Reviewed By: swolchok

Differential Revision: D32730167

fbshipit-source-id: fe18e8fa3881967cad8fa1c26c71c805e9b034e5
(cherry picked from commit 0d394cb252)
2022-02-20 23:29:13 +00:00
..
functions Upgrading the loop to use irange (#70326) 2022-01-06 07:06:53 -08:00
utils remove some spurious warnings fixing take 2 (#72542) 2022-02-11 16:08:56 +00:00
anomaly_mode.cpp Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
anomaly_mode.h Remove WindowsTorchApiMacro.h in favor of Export.h (#69585) 2021-12-09 17:30:09 -08:00
autograd_meta.cpp Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
autograd_not_implemented_fallback.cpp aliasing fixes (#66977) 2021-11-09 18:33:37 -08:00
autograd_not_implemented_fallback.h Codegen: python_torch_functions only include relevant operators (#68693) 2022-01-21 15:37:06 +00:00
autograd.cpp supports non-leaf inputs for autograd.backward() function (#60521) 2021-06-25 18:57:26 -07:00
autograd.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
cpp_hook.cpp Upgrading the loop to use irange (#70326) 2022-01-06 07:06:53 -08:00
cpp_hook.h Factor out TensorBase that doesn't depend on native operators (#63612) 2021-09-08 13:28:54 -07:00
custom_function.cpp Fix internal assert custom function when input does not require grad (#72008) 2022-02-01 22:36:04 +00:00
custom_function.h Factor out TensorBase that doesn't depend on native operators (#63612) 2021-09-08 13:28:54 -07:00
edge.h
engine.cpp Allow forking until a worker thread is created in autograd engine (#72689) 2022-02-12 01:52:57 +00:00
engine.h Remove un-used function in autograd engine (#72687) 2022-02-12 01:52:56 +00:00
forward_grad.cpp Fix deadlock for multi-output forward AD (#67995) 2021-11-09 01:32:43 -08:00
forward_grad.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
function_hook.cpp
function_hook.h Remove WindowsTorchApiMacro.h in favor of Export.h (#69585) 2021-12-09 17:30:09 -08:00
function.cpp Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
function.h [warnings][Caffe2] Suppress warnings in non-c10 headers (#71370) 2022-01-17 10:09:31 -08:00
FunctionsManual.cpp Port amax to structured kernel (#72124) 2022-02-16 06:33:09 +00:00
FunctionsManual.h Implement Tanh Gelu Approximation (#61439) 2022-02-14 03:40:32 +00:00
grad_mode.h Remove WindowsTorchApiMacro.h in favor of Export.h (#69585) 2021-12-09 17:30:09 -08:00
InferenceMode.h Remove WindowsTorchApiMacro.h in favor of Export.h (#69585) 2021-12-09 17:30:09 -08:00
init.cpp Add new tls snapshot feature (#72832) 2022-02-15 19:02:05 +00:00
input_buffer.cpp Adds stream recording for cross-stream uses of gradients in streaming backward (#60230) 2021-06-22 12:16:07 -07:00
input_buffer.h Make pytorch clang-tidy clean (#60649) 2021-07-01 12:21:07 -07:00
input_metadata.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
profiler_kineto.cpp [Profiler] Optimize reportMemoryUsage (#71538) 2022-02-20 23:29:13 +00:00
profiler_kineto.h [Profiler] Defer KinetoEvent and GenericTraceActivity creation to post processing. (#71539) 2022-02-02 18:42:50 +00:00
profiler_legacy.cpp [Profiler] Split observer implementations based on ProfilerState (#71135) 2022-01-26 18:33:24 +00:00
profiler_legacy.h [Profiler] Split observer implementations based on ProfilerState (#71135) 2022-01-26 18:33:24 +00:00
profiler_python.cpp Upgrading the loop to use irange (#70326) 2022-01-06 07:06:53 -08:00
profiler_python.h [Reland] Python tracer. (#68325) 2021-11-15 23:32:49 -08:00
profiler.h
python_anomaly_mode.cpp Fix anomaly mode memory leak (#51610) 2021-02-04 11:53:37 -08:00
python_anomaly_mode.h Fix warnings (#62930) 2021-08-11 14:07:10 -07:00
python_autograd.h
python_cpp_function.cpp Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
python_cpp_function.h
python_engine.cpp [Deploy] Avoid use-after-free during autograd shutdown (#64620) 2021-09-13 12:43:10 -07:00
python_engine.h [reland] Add default Saved Variable hooks (#62563) 2021-08-02 11:30:26 -07:00
python_fft_functions.h
python_function.cpp Fix refcounting in access of saved for forward attribute (#72627) 2022-02-10 04:02:46 +00:00
python_function.h Add save_for_forward to custom function (#71569) 2022-01-25 07:30:46 +00:00
python_hook.cpp fix: support removing hook in the hook (#61250) 2021-07-09 09:27:58 -07:00
python_hook.h
python_legacy_variable.cpp Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
python_legacy_variable.h
python_linalg_functions.h
python_mode.cpp [Reland] Add python mode (#64360) 2021-09-16 09:02:30 -07:00
python_mode.h [Reland] Add python mode (#64360) 2021-09-16 09:02:30 -07:00
python_nn_functions.h
python_return_types.h expose return_types in Python (#66614) 2021-12-06 09:05:29 -08:00
python_saved_variable_hooks.cpp Enable nested default hooks (#70932) 2022-01-11 15:03:49 -08:00
python_saved_variable_hooks.h Enable nested default hooks (#70932) 2022-01-11 15:03:49 -08:00
python_sparse_functions.h [clone][sparse] Add torch._C._sparse namespace (#68672) 2021-11-19 19:47:38 -08:00
python_special_functions.h [special] add torch.special namespace (#52296) 2021-03-04 00:04:36 -08:00
python_torch_functions_manual.cpp Update logspace and bump the version number to 9 (#72051) 2022-02-02 08:54:14 +00:00
python_torch_functions.h Shard python_torch_functions.cpp (#62187) 2021-08-25 15:10:43 -07:00
python_variable_indexing.cpp Release GIL in Tensor indexing functions (#71728) 2022-01-25 22:30:48 +00:00
python_variable_indexing.h
python_variable.cpp [BE] Fix pybind deprecation warnings (#72376) 2022-02-07 18:33:32 +00:00
python_variable.h Codegen: python_torch_functions only include relevant operators (#68693) 2022-01-21 15:37:06 +00:00
README.md
record_function_ops.cpp [PyTorch] Support additional arguments in Python record function (#65736) 2021-10-13 01:49:15 -07:00
record_function_ops.h [PyTorch] Support additional arguments in Python record function (#65736) 2021-10-13 01:49:15 -07:00
saved_variable_hooks.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
saved_variable.cpp Forbid inplace modification of a saved tensor's pack_hook input (#62717) 2021-08-12 12:40:10 -07:00
saved_variable.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
symbolic.h
TraceTypeManual.cpp Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
variable.cpp Add hook for functorch to error out with unoverridable autograd operations (#72176) 2022-02-02 22:07:03 +00:00
variable.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00
VariableTypeManual.cpp Make detach redispatch like a regular PyTorch operator (#71707) 2022-01-28 16:13:36 +00:00
VariableTypeUtils.h Codegen: TraceType only includes operators being registered (#68691) 2022-01-02 13:09:19 -08:00

Autograd

Autograd is a hotspot for PyTorch performance, so most of the heavy lifting is implemented in C++. This implies that we have to do some shuffling between Python and C++; and in general, we want data to be in a form that is convenient to manipulate from C++.

Our general model is that for any key data type that autograd manipulates, there are two implementations: a C++ type and a Python object type. For example, consider variables in autograd: we have both Variable in variable.h (the C++ type) and THPVariable in python_variable.h (the Python type.) (By the way, THP stands for TorcH Python, not to be confused with THPP, TorcH C++). Variable contains the payload of a variable, while THPVariable just contains a shared_ptr reference to Variable, as well as references to other Python objects which the Python runtime needs to know about. A lot of data accessor implementations in python_variable.cpp simply reach through to the underlying Variable and return the appropriate value.

The most complicated application of this principle is Function, which also supports users implementing custom behavior in Python. We have the following classes:

  • Node in function.h, the C++ type.
  • THPFunction in python_function.h, the Python object type. In python_function.cpp, you can see the boilerplate that tells the Python interpreter about this object.
  • PyNode in python_function.h, a subclass of Node which forwards apply to a Python THPFunction. (NOT a Python object, despite its name!)

Outside of PyNode, the C++ objects largely avoid referencing Python objects (there are a few exceptions, like pyobj in Variable, and PyNode, whose whole point is to let C++ call into Python). And pyobj in Node to ensure uniqueness of the associated python wrapper (if it exists).