pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

fulvius31 e3e45d90d8 [Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 ) Modified TorchInductor’s autotuning flow so that each `best_config` JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set `store_cubin = True` since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147019 Approved by: https://github.com/davidberard98		2025-03-04 12:16:38 +00:00
..
_awaits
_C	Use Python 3.9 typing (#148157 )	2025-03-04 03:09:55 +00:00
_C_flatbuffer
_custom_op
_decomp	[Inductor] Avoid tensor slice overflow for large step (#147433 )	2025-03-02 16:07:15 +00:00
_dispatch	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 )	2025-02-28 03:03:53 +00:00
_dynamo	Introduce delayed compile via `eager_then_compile` stance (#147983 )	2025-03-04 07:46:31 +00:00
_export	[export] Sync aoti schema to schema.py (#148017 )	2025-02-27 21:46:11 +00:00
_functorch	[invoke_subgraph] Run joint passes on the hop graphs (#139325 )	2025-03-03 23:38:14 +00:00
_higher_order_ops	[user-triton] handle inline_asm_case (#148043 )	2025-02-28 20:52:51 +00:00
_inductor	[Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 )	2025-03-04 12:16:38 +00:00
_lazy
_library	Fix the tiny doc descriptions (#147319 )	2025-02-25 17:10:16 +00:00
_logging	[cutlass backend] turn autotuning logs off by default + rename log to autotuning log (#147922 )	2025-02-26 21:02:04 +00:00
_numpy
_prims	Support torch.compile rng selective activation checkpointing with cudagraph (#146878 )	2025-02-28 00:47:03 +00:00
_prims_common	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
_refs	Revert "optimize the decomposition of aten.native_group_norm (#144733 )"	2025-02-27 20:57:25 +00:00
_strobelight	Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549 )	2025-02-22 03:44:53 +00:00
_subclasses	[dynamic shapes][export] ignore when real-tensor fallback fails (#147779 )	2025-03-03 19:09:56 +00:00
_vendor
accelerator
amp	[autocast][pytorch] Support autocast for MTIA (#145627 )	2025-01-25 03:24:59 +00:00
ao	[Intel GPU] qlinear_pointwise.binary[_tensor] XPU support (#135337 )	2025-02-21 02:09:28 +00:00
autograd	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
backends	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
compiler	Significantly speed up save_cache_artifacts (#148227 )	2025-03-03 17:28:41 +00:00
contrib
cpu	[CPU Stream] Add noop for CPU stream record_event() and wait_event() (#145935 )	2025-02-20 18:50:55 +00:00
csrc	[Intel GPU] Enable SDPA on XPU (#147614 )	2025-03-04 01:40:45 +00:00
cuda	Remove outdated CUDA version check (#148142 )	2025-03-04 03:33:44 +00:00
distributed	Use Python 3.9 typing (#148157 )	2025-03-04 03:09:55 +00:00
distributions	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
export	[export] Remove report from draft-export output (#147558 )	2025-02-22 00:54:29 +00:00
fft
func	Add torch.func.debug_unwrap (#146528 )	2025-02-06 18:48:09 +00:00
futures	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
fx	[fx] Optimize TracerBase.create_arg and Graph._gen_python_code (#148292 )	2025-03-04 02:42:23 +00:00
jit	scriptfunction: Make sure we have valid __name__ and __qualname__ (#147906 )	2025-02-28 23:25:47 +00:00
legacy
lib	[codemod] Fix missing field initializer in caffe2/torch/lib/libshm/manager.cpp +1 (#148393 )	2025-03-04 04:20:04 +00:00
linalg
masked
monitor	add WaitCounter type interface and get rid of type errors (#146175 )	2025-02-01 23:24:52 +00:00
mps
mtia	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
multiprocessing
nested	Revert "[cuDNN][SDPA][Nested Tensor] Experimental cuDNN Nested Tensor SDPA Support (forward only) (#141178 )"	2025-02-22 17:28:12 +00:00
nn	Optimize param `prepend` class reference `torch.nn.Module` (#148304 )	2025-03-04 08:46:14 +00:00
onnx	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
optim	[Easy][optim] Add LBFGS params optional desc (#147579 )	2025-02-21 19:38:10 +00:00
package	Remove code for Python < 3.9 (#147097 )	2025-02-14 03:22:49 +00:00
profiler	execution trace export supports gzip format (#146179 )	2025-02-01 01:25:25 +00:00
quantization
signal
sparse	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
special
testing	Remove outdated CUDA version check (#148142 )	2025-03-04 03:33:44 +00:00
utils	doc/xpu: align description of SyclExtension with CPP/CUDA (#147988 )	2025-03-04 04:17:36 +00:00
xpu	xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled (#147431 )	2025-02-24 01:35:54 +00:00
__config__.py
__future__.py
__init__.py	Add cuda 11.8 guard for cufile preload (#148184 )	2025-03-01 01:01:04 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py	Introduce delayed compile via `eager_then_compile` stance (#147983 )	2025-03-04 07:46:31 +00:00
_jit_internal.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py	[Intel GPU] Enable SDPA on XPU (#147614 )	2025-03-04 01:40:45 +00:00
_namedtensor_internals.py
_ops.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py	Add link to non_blocking/pinmem tutorial in `Tensor.to` docstrings (#145651 )	2025-02-18 20:38:01 +00:00
_tensor_str.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
_tensor.py	Revert "Fix non-bitwise type annotations for Tensor operators (see #145838 ) (#146845 )"	2025-02-18 19:01:27 +00:00
_thread_safe_fork.py
_torch_docs.py	Fix `torch.max` optional args `dim`, `keepdim` description (#147177 )	2025-02-20 08:18:09 +00:00
_utils_internal.py	[ROCm] OCP FP8 Support for new GPUs (#146632 )	2025-02-24 22:47:52 +00:00
_utils.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py	Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759 )	2025-02-25 23:51:12 +00:00
abi-check.cpp
CMakeLists.txt	Set USE_CUFILE=1 by default and add pypi package to binary build matrix (#145748 )	2025-02-11 15:49:01 +00:00
custom_class_detail.h
custom_class.h	Remove unneeded Clang-tidy suppression (#148246 )	2025-03-01 16:51:54 +00:00
extension.h
functional.py	Re-add stft option to align window for center = false (#146379 )	2025-02-06 14:07:13 +00:00
hub.py	[BE][CI][Easy] bump `ruff` to 0.9.0: long statements in docstrings (#146509 )	2025-02-24 19:56:08 +00:00
library.h	Remove trivial dispatch_key_allowlist_check function (#146169 )	2025-01-31 19:59:40 +00:00
library.py	[opcheck] Improve error reporting; allow atol/rtol overrides (#146488 )	2025-02-05 21:25:06 +00:00
overrides.py	Use Python 3.9 typing (#148157 )	2025-03-04 03:09:55 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py
script.h
serialization.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
storage.py	add the `torch.float8_e8m0fnu` dtype to PyTorch (#147466 )	2025-02-20 13:55:42 +00:00
torch_version.py	[BE]: Enable ruff SLOT checks (#146276 )	2025-02-04 19:18:23 +00:00
types.py	Improve typing in torch/types.py (#145237 )	2025-01-28 05:29:12 +00:00
version.py.tpl

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.