pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Michael Lazos 253059356f [Cutlass] Implement EVT example tensor creation (#150904 ) This PR implements a translation layer from inductor IR to "example tensors" the expected arguments of the EVT tracer. These tensors basically store the name, shape, stride, and dtype of the tensor and allow an ast-based python parse to generate the EVT C++. udpates to example tensor creation Previously merged: * https://github.com/pytorch/pytorch/pull/150903 * https://github.com/pytorch/pytorch/pull/150346 * https://github.com/pytorch/pytorch/pull/150345 * https://github.com/pytorch/pytorch/pull/150344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150904 Approved by: https://github.com/eellison		2025-04-23 03:26:56 +00:00
..
_awaits
_C	c10d/Store: add nonblocking mode to queue_pop (#151485 )	2025-04-18 02:14:50 +00:00
_C_flatbuffer
_custom_op
_decomp	Make torch._chunk_cat support non-contiguous inputs (#151263 )	2025-04-16 04:18:46 +00:00
_dispatch	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 )	2025-02-28 03:03:53 +00:00
_dynamo	Fix circular imports (#151939 )	2025-04-23 02:53:32 +00:00
_export	[export] Enable symint inputs for AdditionalInputs and ShapesCollection (#151842 )	2025-04-22 22:29:18 +00:00
_functorch	[standalone_compile] Dynamic shape handling (#151788 )	2025-04-22 20:17:24 +00:00
_higher_order_ops	[HOP] Reworked DispatchKey.Autograd (#151107 )	2025-04-15 19:55:46 +00:00
_inductor	[Cutlass] Implement EVT example tensor creation (#150904 )	2025-04-23 03:26:56 +00:00
_lazy
_library	Rename register_fake_profile to unsafe_generate_fake_kernels (#151797 )	2025-04-21 23:08:15 +00:00
_logging	[export] Beef up guard_added logs (#149465 )	2025-03-20 23:02:07 +00:00
_numpy
_prims	Support torch.compile rng selective activation checkpointing with cudagraph (#146878 )	2025-02-28 00:47:03 +00:00
_prims_common	Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infer_size for wildcard dims (#150127 )"	2025-04-22 05:05:50 +00:00
_refs	Revert "[dynamic shapes] guard_or_false for _reshape_view_helper, utils._infer_size for wildcard dims (#150127 )"	2025-04-22 05:05:50 +00:00
_strobelight	Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549 )	2025-02-22 03:44:53 +00:00
_subclasses	Back out "Do not propagate real tensor in extern kernel" (#151813 )	2025-04-21 22:54:03 +00:00
_vendor
accelerator	Delegate torch.accelerator.device_count to torch.xxx.device_count for multi-process usage (#149924 )	2025-04-10 02:37:37 +00:00
amp	[Intel GPU] skip a cuda api call in amp to save some host overhead on xpu (#151111 )	2025-04-13 06:37:07 +00:00
ao	[BE][Easy]: Simplify reversed call in graph matcher (#151674 )	2025-04-19 14:14:31 +00:00
autograd	Fix `torch.autograd.backward` `inputs` validation (#150975 )	2025-04-17 02:11:13 +00:00
backends	Expose is_available API for torch.backends.mkldnn (#147432 )	2025-04-10 05:05:37 +00:00
compiler	[MegaCache] Rename the PGO artifact when used between different jobs (#151482 )	2025-04-17 17:09:29 +00:00
contrib
cpu	[CPU Stream] Add noop for CPU stream record_event() and wait_event() (#145935 )	2025-02-20 18:50:55 +00:00
csrc	Updates NCCLConfig with QOS variable (#151821 )	2025-04-23 00:03:49 +00:00
cuda	[ROCm][TunableOp] Support submatrices in offline tuning (#151138 )	2025-04-19 04:14:27 +00:00
distributed	logging start of torch elastic workers. (#150849 )	2025-04-22 22:35:06 +00:00
distributions	add generalized pareto distribution (GPD) (#135968 )	2025-04-17 18:51:02 +00:00
export	[export] Enable symint inputs for AdditionalInputs and ShapesCollection (#151842 )	2025-04-22 22:29:18 +00:00
fft
func
futures	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
fx	[dynamic shapes] bound_sympy for size-oblivious min/max reasoning (#151242 )	2025-04-23 02:14:05 +00:00
jit	Fix torchscript issues with reference quantized modules (#150870 )	2025-04-10 20:14:45 +00:00
legacy
lib	[1/N] Use internal linkage in torch/csrc C++ files. (#150930 )	2025-04-11 02:19:31 +00:00
linalg	Implement gradient for the `residuals` of `torch.linalg.lstsq` (#148526 )	2025-03-10 12:35:09 +00:00
masked	[BE][Easy]: Dedupe a TypeAlias in PrimsCommon (#151565 )	2025-04-17 19:59:41 +00:00
monitor
mps	[MPS] Make `torch.mps.compile_shader` public (#148972 )	2025-03-11 20:20:58 +00:00
mtia	[MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340 )	2025-03-18 15:15:12 +00:00
multiprocessing
nested	[aotd] Guess tangents stride as output strides (#144579 )	2025-03-20 15:41:36 +00:00
nn	Optimize register_full_backward_hook description when all input no grad (#151785 )	2025-04-22 17:57:31 +00:00
onnx	[ONNX] Update decomposition logic to loop over onnx registry (#151826 )	2025-04-22 19:40:52 +00:00
optim	Optimize typing in `lr_scheduler.py` (#151219 )	2025-04-15 01:00:13 +00:00
package
profiler	[profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124 )	2025-04-15 16:11:49 +00:00
quantization
signal
sparse	Fix spelling (#149277 )	2025-03-20 01:02:32 +00:00
special
testing	[MPS] Extend index_put to half precision floats (#151869 )	2025-04-22 22:00:08 +00:00
utils	[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264 )	2025-04-22 21:55:40 +00:00
xpu	xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled (#147431 )	2025-02-24 01:35:54 +00:00
__config__.py
__future__.py
__init__.py	[profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124 )	2025-04-15 16:11:49 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py	Back out "Do not propagate real tensor in extern kernel" (#151813 )	2025-04-21 22:54:03 +00:00
_jit_internal.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py	[dtensor] add op support for torch._grouped_mm (#151072 )	2025-04-12 07:07:44 +00:00
_namedtensor_internals.py
_ops.py	Introduce unsafe way to mark functions as cacheable (#151603 )	2025-04-21 17:37:38 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py	[Docs] Clarify behavior when integer dtype is used with requires_grad=True in `tensor.to()` (#150913 )	2025-04-10 02:52:58 +00:00
_tensor_str.py	add `torch.float4_e2m1fn_x2` to PyTorch (#148791 )	2025-03-27 17:32:20 +00:00
_tensor.py	Revert "Fix non-bitwise type annotations for Tensor operators (see #145838 ) (#146845 )"	2025-02-18 19:01:27 +00:00
_thread_safe_fork.py
_torch_docs.py	Added to docs for out_dtype arg in torch gemms (#151704 )	2025-04-21 20:09:17 +00:00
_utils_internal.py	[profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124 )	2025-04-15 16:11:49 +00:00
_utils.py	Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786 )	2025-03-06 12:04:32 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py	Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759 )	2025-02-25 23:51:12 +00:00
CMakeLists.txt	Add new dependences for gen_pyi.py (#150391 )	2025-04-03 14:18:18 +00:00
custom_class_detail.h
custom_class.h	Remove unneeded Clang-tidy suppression (#148246 )	2025-03-01 16:51:54 +00:00
extension.h
functional.py	Optimize `cdist` param description (#151178 )	2025-04-14 13:53:10 +00:00
hub.py	[BE][CI][Easy] bump `ruff` to 0.9.0: long statements in docstrings (#146509 )	2025-02-24 19:56:08 +00:00
library.h	Overload Library::def rather than templating it (#151626 )	2025-04-18 22:51:16 +00:00
library.py	fix spammy library deinit errors when user passes an invalid TORCH_LOGS argument (#151678 )	2025-04-22 20:13:52 +00:00
overrides.py	[CUDA][cuBLAS] Aten GEMM overload for FP32 output from FP16/BF16 inputs (#150812 )	2025-04-18 01:53:26 +00:00
py.typed
quasirandom.py
random.py
README.md	Rename README.txt to README.md (#149811 )	2025-03-24 22:33:33 +00:00
return_types.py
script.h
serialization.py	Move get accelerator to use build time flags when possible (#146098 )	2025-03-10 13:17:58 +00:00
storage.py	add `torch.float4_e2m1fn_x2` to PyTorch (#148791 )	2025-03-27 17:32:20 +00:00
torch_version.py
types.py
version.py.tpl

README.md

Note [TH abstraction violation]


TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.