pytorch/torch/csrc/autograd
soulitzer 8bda95228f [autograd] Avoid creating and recording event when unnecessary (#157503)
Today, we always create and record an events in two places:
1) Upon seeing the first producer, we record an event on the producer, and we wait for this event in two places: (1) when the engine goes to run the consumer, the consumer stream waits for this event. (2) prior to doing accumulation, the accumulation stream waits for this event.

2) After doing accumulation, we record an event on the accumulation stream and wait for this event in a single place: when the engine goes to run the consumer.

We do not actually need to record the event in the cases where the 1st producer stream is the same as the consumer and as the accumulation stream, and where the accumulation stream is the same as the consumer stream.

Removing this unnecessary create + record event should save a few us for each instance avoided.

Fixes https://github.com/pytorch/pytorch/issues/157407

----

Manual test plan:
- [x] @eqy to confirm perf is restored
- [x] Running the repro originally reported before/after the patch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157503
Approved by: https://github.com/eqy
ghstack dependencies: #155715
2025-07-09 03:36:14 +00:00
..
functions [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
utils [ddp] propagate use_python_reducer to C++ reducer (#152735) 2025-05-16 01:38:03 +00:00
anomaly_mode.cpp
anomaly_mode.h [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
autograd_meta.cpp [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
autograd_not_implemented_fallback.cpp
autograd_not_implemented_fallback.h
autograd.cpp
autograd.h
cpp_hook.cpp [ca] introduce RuntimeState to support c++ hooks via graph breaks (#149987) 2025-03-27 05:05:34 +00:00
cpp_hook.h [ca] introduce RuntimeState to support c++ hooks via graph breaks (#149987) 2025-03-27 05:05:34 +00:00
custom_function.cpp Add missing in-place on view check to custom autograd.Function (#153094) 2025-05-12 14:42:46 +00:00
custom_function.h [reland][ca] side-effect free inital trace: compiled_args (#148376) 2025-03-11 01:57:36 +00:00
edge.h
engine.cpp [autograd] Avoid creating and recording event when unnecessary (#157503) 2025-07-09 03:36:14 +00:00
engine.h
forward_grad.cpp
forward_grad.h [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
function_hook.h [ca] suggest to disable compiled autograd for trace-time NotImplementedErrors (#156509) 2025-06-21 18:33:46 +00:00
function.cpp
function.h [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
FunctionsManual.cpp Enable Half dtype for logcumsumexp_backward (#157512) 2025-07-03 18:13:38 +00:00
FunctionsManual.h Fix 'dllimport attribute ignored on inline function' (#157670) 2025-07-07 16:57:48 +00:00
grad_mode.h
graph_task.h
InferenceMode.h
init.cpp Add is_hidden_event method to KinetoEvent Python interface (#155214) 2025-07-02 16:29:21 +00:00
input_buffer.cpp [autograd] Avoid creating and recording event when unnecessary (#157503) 2025-07-09 03:36:14 +00:00
input_buffer.h Rewrite autograd producer consumer stream sync logic (#151079) 2025-05-16 15:42:22 +00:00
input_metadata.cpp used guard_or_false instead of guard_size_oblivious inside maybe_reduce (#154172) 2025-05-26 21:59:52 +00:00
input_metadata.h
jit_decomp_interface.cpp
jit_decomp_interface.h [Lint] Update clang-format to 19.1.4 (#153889) 2025-05-20 14:12:46 +00:00
profiler_kineto.cpp Add is_hidden_event method to KinetoEvent Python interface (#155214) 2025-07-02 16:29:21 +00:00
profiler_kineto.h Add is_hidden_event method to KinetoEvent Python interface (#155214) 2025-07-02 16:29:21 +00:00
profiler_legacy.cpp
profiler_legacy.h
profiler_python.cpp Fix profiler on cpython-3.13 (#153848) 2025-05-19 21:20:53 +00:00
profiler_python.h
profiler.h
python_anomaly_mode.cpp
python_anomaly_mode.h
python_autograd.h
python_cpp_function.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
python_cpp_function.h
python_engine.cpp Fix clang-tidy bugprone* warnings (#148529) 2025-06-23 23:09:56 +00:00
python_engine.h
python_enum_tag.h
python_fft_functions.h
python_function.cpp Fix clang-tidy bugprone* warnings (#148529) 2025-06-23 23:09:56 +00:00
python_function.h [reland][ca] side-effect free inital trace: compiled_args (#148376) 2025-03-11 01:57:36 +00:00
python_hook.cpp [reland][ca] side-effect free inital trace: compiled_args (#148376) 2025-03-11 01:57:36 +00:00
python_hook.h [reland][ca] side-effect free inital trace: compiled_args (#148376) 2025-03-11 01:57:36 +00:00
python_legacy_variable.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
python_legacy_variable.h
python_linalg_functions.h
python_nested_functions_manual.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
python_nested_functions.h
python_nn_functions.h
python_saved_variable_hooks.cpp
python_saved_variable_hooks.h
python_sparse_functions.h
python_special_functions.h
python_torch_functions_manual.cpp [aotd] Support mutations of the same input in fw and bw (#155354) 2025-06-26 14:05:54 +00:00
python_torch_functions.h Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
python_variable_indexing.cpp fix warning spam for list indexing (#155815) 2025-06-12 23:07:24 +00:00
python_variable_indexing.h
python_variable.cpp Fix clang-tidy bugprone* warnings (#148529) 2025-06-23 23:09:56 +00:00
python_variable.h
README.md Rename torch::autograd::Function to torch::autograd::Node 2019-07-23 20:52:22 -07:00
record_function_ops.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
record_function_ops.h
saved_variable_hooks.h
saved_variable.cpp [aotd] Support saved tensors hooks in aot_autograd (#150032) 2025-05-22 14:09:38 +00:00
saved_variable.h
symbolic.h
TraceTypeManual.cpp [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319) 2025-06-23 02:57:50 +00:00
variable_info.cpp
variable_info.h
variable.cpp
variable.h Revert "Enable Leak Sanitizer (#154584)" 2025-06-23 10:08:40 +00:00
VariableTypeManual.cpp Revert "Enable Leak Sanitizer (#154584)" 2025-06-23 10:08:40 +00:00
VariableTypeUtils.h

Autograd

Autograd is a hotspot for PyTorch performance, so most of the heavy lifting is implemented in C++. This implies that we have to do some shuffling between Python and C++; and in general, we want data to be in a form that is convenient to manipulate from C++.

Our general model is that for any key data type that autograd manipulates, there are two implementations: a C++ type and a Python object type. For example, consider variables in autograd: we have both Variable in variable.h (the C++ type) and THPVariable in python_variable.h (the Python type.) (By the way, THP stands for TorcH Python, not to be confused with THPP, TorcH C++). Variable contains the payload of a variable, while THPVariable just contains a shared_ptr reference to Variable, as well as references to other Python objects which the Python runtime needs to know about. A lot of data accessor implementations in python_variable.cpp simply reach through to the underlying Variable and return the appropriate value.

The most complicated application of this principle is Function, which also supports users implementing custom behavior in Python. We have the following classes:

  • Node in function.h, the C++ type.
  • THPFunction in python_function.h, the Python object type. In python_function.cpp, you can see the boilerplate that tells the Python interpreter about this object.
  • PyNode in python_function.h, a subclass of Node which forwards apply to a Python THPFunction. (NOT a Python object, despite its name!)

Outside of PyNode, the C++ objects largely avoid referencing Python objects (there are a few exceptions, like pyobj in Variable, and PyNode, whose whole point is to let C++ call into Python). And pyobj in Node to ensure uniqueness of the associated python wrapper (if it exists).