pytorch/torch
Darshan Sanghani 33dd4f187d [pytorch/et] Allow ET to save additional resources for completing a trace like generated kernels and index tensor data (#143430)
The resources directory lets ET observer dump any additional data like Triton kernels while capturing the ET.

This allows us to use the ET trace to replay PT2 workloads and get visibility into data like generated kernels and their usage in a model, index tensor data etc.

We also added a few ways to enable ET and ET Resources through the OS environment variables.

Setting `ENABLE_PYTORCH_EXECUTION_TRACE` will enable default Execution Tracing in Pytorch.

Additionally setting `ENABLE_PYTORCH_EXECUTION_TRACE_EXTRAS` will enable ET to collect extra resources from the ET run like Triton Kernels.

Differential Revision: [D58707846](https://our.internmc.facebook.com/intern/diff/D58707846/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143430
Approved by: https://github.com/shengfukevin, https://github.com/sraikund16
2024-12-20 21:20:32 +00:00
..
_awaits
_C [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
_C_flatbuffer
_custom_op
_decomp [Inductor][CPU] disable bernoulli_p decomposition (#143460) 2024-12-19 11:21:35 +00:00
_dispatch Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_dynamo pgo: Log feature use (#142819) 2024-12-20 20:22:20 +00:00
_export [ts converter] use Dim.AUTO for ts -> export converter (#138273) 2024-12-20 07:48:24 +00:00
_functorch remove allow-untyped-defs for torch/_functorch/batch_norm_replacement.py (#143438) 2024-12-18 09:01:06 +00:00
_higher_order_ops [user triton] Raise an exception when encountering nested @triton.autotune decorators or @triton.heuristics (#143519) 2024-12-20 06:38:45 +00:00
_inductor [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
_lazy remove allow-untyped-defs from torch/_lazy/config.py (#143603) 2024-12-20 05:34:19 +00:00
_library Revert "[export] don't decompose custom triton op when exporting (#142426)" 2024-12-19 21:21:38 +00:00
_logging Add "inductor_pre_grad_graph" logging (#142717) (#143126) 2024-12-13 21:48:25 +00:00
_numpy Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_prims Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_prims_common Pass allow_rhs_unbacked to the stride test in metadata test too (#143040) 2024-12-19 09:37:50 +00:00
_refs Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_strobelight Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_subclasses Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_vendor
accelerator [BE][accelerator] formalize API name {current,set}_device_{idx => index} (#140542) 2024-12-12 10:53:48 +00:00
amp
ao remove allow-untyped-defs from torch/ao/quantization/experimental/APoT_tensor.py (#143601) 2024-12-20 05:26:09 +00:00
autograd Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
backends [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
compiler [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
contrib
cpu
csrc [c10d][fr] flight recorder improvements (#143446) 2024-12-20 20:41:30 +00:00
cuda [ROCm] Fix unit test: matmul_offline_mgpu_tunableop (#143507) 2024-12-19 19:48:20 +00:00
distributed remove allow-untyped-defs from torch/distributed/elastic/multiprocessing/errors/handlers.py (#143605) 2024-12-20 05:25:01 +00:00
distributions Remove some unused type ignores (round 1) (#142325) 2024-12-09 18:23:46 +00:00
export [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
fft
func
futures
fx Revert "refactor tensorify restart logic to use sources (#141517)" (#143623) 2024-12-20 15:38:34 +00:00
jit Add warning to torch.jit.load (#143403) 2024-12-18 00:17:41 +00:00
legacy
lib
linalg
masked remove allow-untyped-defs for torch/masked/maskedtensor/creation.py (#143321) 2024-12-17 16:44:50 +00:00
monitor
mps [MPS] Add CompileShader method (#141478) 2024-12-11 02:00:51 +00:00
mtia (MTIA) Move "empty_cache" API (#143402) 2024-12-20 17:39:06 +00:00
multiprocessing
nested NJT linear_backward should not return inner tensor as-is (#143333) 2024-12-18 00:15:18 +00:00
nn Rewrite _reparametrize_module to use contextmanager (#138203) 2024-12-20 12:02:27 +00:00
onnx [Codemod][AddExplicitStrictExportArg] caffe2/torch/onnx/_internal/exporter (#143542) 2024-12-20 00:54:52 +00:00
optim Add support for differentiable LR in SGD + test v2.0 (#143510) 2024-12-19 21:04:44 +00:00
package
profiler [pytorch/et] Allow ET to save additional resources for completing a trace like generated kernels and index tensor data (#143430) 2024-12-20 21:20:32 +00:00
quantization
signal
sparse
special
testing [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
utils Add config.save.use_pinned_memory_for_d2h to serialization config (#143342) 2024-12-20 21:01:18 +00:00
xpu
__config__.py remove allow-untyped-defs for torch/__config__.py (#143320) 2024-12-17 00:16:09 +00:00
__future__.py
__init__.py [dynamo, 3.13t] raise error if torch.compile is attempted in 3.13t (nogil) (#143404) 2024-12-19 18:10:01 +00:00
_appdirs.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py dynamo tracing perf: no import on hot path: 47.62 -> 47.26 (#143065) 2024-12-20 20:06:42 +00:00
_jit_internal.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_linalg_utils.py
_lobpcg.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_lowrank.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_meta_registrations.py [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
_namedtensor_internals.py
_ops.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py
_tensor_str.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
_tensor.py __cuda_array_interface__: Use "<V2" for bfloat16. (#143042) 2024-12-14 06:27:52 +00:00
_thread_safe_fork.py
_torch_docs.py Add torch.cat tensors type promotion description (#141339) 2024-12-14 01:36:41 +00:00
_utils_internal.py [reland] Kill capture_pre_autograd_graph API (#143426) 2024-12-18 12:07:09 +00:00
_utils.py Reraise worker errors as runtime errors in more cases when the original exception can't be constructed (#140911) 2024-12-14 03:11:36 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py Remove unused Python variables in torch/[_-a]* (#133492) 2024-12-12 17:39:14 +00:00
abi-check.cpp
CMakeLists.txt export AOTI_TORCH_EXPORT on Windows. (#140030) 2024-12-20 11:42:09 +00:00
custom_class_detail.h
custom_class.h
extension.h
functional.py
hub.py
library.h
library.py make it clearer (in docs) one can double decorate with torch.library.impl_* APIs (#137608) 2024-12-17 15:13:58 +00:00
overrides.py [dim_order] raised runtime error when tensor has ambiguous dim order (#141632) 2024-12-08 23:16:57 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py
script.h
serialization.py Add config.save.use_pinned_memory_for_d2h to serialization config (#143342) 2024-12-20 21:01:18 +00:00
storage.py
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.