pytorch/torch
Michael Lazos 253059356f [Cutlass] Implement EVT example tensor creation (#150904)
This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++.

udpates to example tensor creation

Previously merged:
* https://github.com/pytorch/pytorch/pull/150903
* https://github.com/pytorch/pytorch/pull/150346
* https://github.com/pytorch/pytorch/pull/150345
* https://github.com/pytorch/pytorch/pull/150344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150904
Approved by: https://github.com/eellison
2025-04-23 03:26:56 +00:00
..
_awaits
_C c10d/Store: add nonblocking mode to queue_pop (#151485) 2025-04-18 02:14:50 +00:00
_C_flatbuffer
_custom_op
_decomp Make torch._chunk_cat support non-contiguous inputs (#151263) 2025-04-16 04:18:46 +00:00
_dispatch [BE][PYFMT] migrate PYFMT for torch._dynamo to ruff format (#144549) 2025-02-28 03:03:53 +00:00
_dynamo Fix circular imports (#151939) 2025-04-23 02:53:32 +00:00
_export [export] Enable symint inputs for AdditionalInputs and ShapesCollection (#151842) 2025-04-22 22:29:18 +00:00
_functorch [standalone_compile] Dynamic shape handling (#151788) 2025-04-22 20:17:24 +00:00
_higher_order_ops [HOP] Reworked DispatchKey.Autograd (#151107) 2025-04-15 19:55:46 +00:00
_inductor [Cutlass] Implement EVT example tensor creation (#150904) 2025-04-23 03:26:56 +00:00
_lazy
_library Rename register_fake_profile to unsafe_generate_fake_kernels (#151797) 2025-04-21 23:08:15 +00:00
_logging [export] Beef up guard_added logs (#149465) 2025-03-20 23:02:07 +00:00
_numpy
_prims Support torch.compile rng selective activation checkpointing with cudagraph (#146878) 2025-02-28 00:47:03 +00:00
_prims_common Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infer_size for wildcard dims (#150127)" 2025-04-22 05:05:50 +00:00
_refs Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infer_size for wildcard dims (#150127)" 2025-04-22 05:05:50 +00:00
_strobelight Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549) 2025-02-22 03:44:53 +00:00
_subclasses Back out "Do not propagate real tensor in extern kernel" (#151813) 2025-04-21 22:54:03 +00:00
_vendor
accelerator Delegate torch.accelerator.device_count to torch.xxx.device_count for multi-process usage (#149924) 2025-04-10 02:37:37 +00:00
amp [Intel GPU] skip a cuda api call in amp to save some host overhead on xpu (#151111) 2025-04-13 06:37:07 +00:00
ao [BE][Easy]: Simplify reversed call in graph matcher (#151674) 2025-04-19 14:14:31 +00:00
autograd Fix torch.autograd.backward inputs validation (#150975) 2025-04-17 02:11:13 +00:00
backends Expose is_available API for torch.backends.mkldnn (#147432) 2025-04-10 05:05:37 +00:00
compiler [MegaCache] Rename the PGO artifact when used between different jobs (#151482) 2025-04-17 17:09:29 +00:00
contrib
cpu [CPU Stream] Add noop for CPU stream record_event() and wait_event() (#145935) 2025-02-20 18:50:55 +00:00
csrc Updates NCCLConfig with QOS variable (#151821) 2025-04-23 00:03:49 +00:00
cuda [ROCm][TunableOp] Support submatrices in offline tuning (#151138) 2025-04-19 04:14:27 +00:00
distributed logging start of torch elastic workers. (#150849) 2025-04-22 22:35:06 +00:00
distributions add generalized pareto distribution (GPD) (#135968) 2025-04-17 18:51:02 +00:00
export [export] Enable symint inputs for AdditionalInputs and ShapesCollection (#151842) 2025-04-22 22:29:18 +00:00
fft
func
futures PEP585: More UP006 fixes (#146392) 2025-02-20 06:18:13 +00:00
fx [dynamic shapes] bound_sympy for size-oblivious min/max reasoning (#151242) 2025-04-23 02:14:05 +00:00
jit Fix torchscript issues with reference quantized modules (#150870) 2025-04-10 20:14:45 +00:00
legacy
lib [1/N] Use internal linkage in torch/csrc C++ files. (#150930) 2025-04-11 02:19:31 +00:00
linalg Implement gradient for the residuals of torch.linalg.lstsq (#148526) 2025-03-10 12:35:09 +00:00
masked [BE][Easy]: Dedupe a TypeAlias in PrimsCommon (#151565) 2025-04-17 19:59:41 +00:00
monitor
mps [MPS] Make torch.mps.compile_shader public (#148972) 2025-03-11 20:20:58 +00:00
mtia [MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340) 2025-03-18 15:15:12 +00:00
multiprocessing
nested [aotd] Guess tangents stride as output strides (#144579) 2025-03-20 15:41:36 +00:00
nn Optimize register_full_backward_hook description when all input no grad (#151785) 2025-04-22 17:57:31 +00:00
onnx [ONNX] Update decomposition logic to loop over onnx registry (#151826) 2025-04-22 19:40:52 +00:00
optim Optimize typing in lr_scheduler.py (#151219) 2025-04-15 01:00:13 +00:00
package
profiler [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124) 2025-04-15 16:11:49 +00:00
quantization
signal
sparse Fix spelling (#149277) 2025-03-20 01:02:32 +00:00
special
testing [MPS] Extend index_put to half precision floats (#151869) 2025-04-22 22:00:08 +00:00
utils [ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264) 2025-04-22 21:55:40 +00:00
xpu xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled (#147431) 2025-02-24 01:35:54 +00:00
__config__.py
__future__.py
__init__.py [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124) 2025-04-15 16:11:49 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py Back out "Do not propagate real tensor in extern kernel" (#151813) 2025-04-21 22:54:03 +00:00
_jit_internal.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py [dtensor] add op support for torch._grouped_mm (#151072) 2025-04-12 07:07:44 +00:00
_namedtensor_internals.py
_ops.py Introduce unsafe way to mark functions as cacheable (#151603) 2025-04-21 17:37:38 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py [Docs] Clarify behavior when integer dtype is used with requires_grad=True in tensor.to() (#150913) 2025-04-10 02:52:58 +00:00
_tensor_str.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
_tensor.py Revert "Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845)" 2025-02-18 19:01:27 +00:00
_thread_safe_fork.py
_torch_docs.py Added to docs for out_dtype arg in torch gemms (#151704) 2025-04-21 20:09:17 +00:00
_utils_internal.py [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124) 2025-04-15 16:11:49 +00:00
_utils.py Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786) 2025-03-06 12:04:32 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759) 2025-02-25 23:51:12 +00:00
CMakeLists.txt Add new dependences for gen_pyi.py (#150391) 2025-04-03 14:18:18 +00:00
custom_class_detail.h
custom_class.h Remove unneeded Clang-tidy suppression (#148246) 2025-03-01 16:51:54 +00:00
extension.h
functional.py Optimize cdist param description (#151178) 2025-04-14 13:53:10 +00:00
hub.py [BE][CI][Easy] bump ruff to 0.9.0: long statements in docstrings (#146509) 2025-02-24 19:56:08 +00:00
library.h Overload Library::def rather than templating it (#151626) 2025-04-18 22:51:16 +00:00
library.py fix spammy library deinit errors when user passes an invalid TORCH_LOGS argument (#151678) 2025-04-22 20:13:52 +00:00
overrides.py [CUDA][cuBLAS] Aten GEMM overload for FP32 output from FP16/BF16 inputs (#150812) 2025-04-18 01:53:26 +00:00
py.typed
quasirandom.py
random.py
README.md Rename README.txt to README.md (#149811) 2025-03-24 22:33:33 +00:00
return_types.py
script.h
serialization.py Move get accelerator to use build time flags when possible (#146098) 2025-03-10 13:17:58 +00:00
storage.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]


TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.