pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Shunting Zhang 901b02cf16 [Inductor] fix alignement assumption for fallback (#150777 ) Inductor right now only works properly for fallback kernels producing aligned output. When Inductor create layout for fallback kernel output, Inductor does not add the tensor offset to the layout [link](`2a1e2b88ed/torch/_inductor/ir.py (L6935-L6941)`). Thus unaligned output will be treated as aligned. Adding the offset to the layout directly does not work since that change the index expression in the generated kernel and we may 'double' applying the offset. Triton already considers the offset when passing in the data_ptr. To solve this issue, we track the unaligned buffer names instead. This potentially can fix the internal issues we are debugging here: https://fb.workplace.com/groups/1075192433118967/permalink/1618308128807392/ Differential Revision: [D72600784](https://our.internmc.facebook.com/intern/diff/D72600784) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150777 Approved by: https://github.com/eellison, https://github.com/jansel		2025-04-08 18:49:44 +00:00
..
_awaits
_C	Revert "[fx] Move Node._prepend/Node._remove_from_list to C++ (#148261 )" (#150542 )	2025-04-03 21:15:38 +00:00
_C_flatbuffer
_custom_op
_decomp	Fix torch.matmul related out dtype check (#148174 )	2025-04-08 17:00:28 +00:00
_dispatch	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 )	2025-02-28 03:03:53 +00:00
_dynamo	[dynamo] reconstruct functions decorated in the compiled region properly (#150645 )	2025-04-08 17:32:46 +00:00
_export	[export] raise when Dim.DYNAMIC 0/1 specializes (#150716 )	2025-04-07 18:58:42 +00:00
_functorch	Make CompileEventLogger more defensive w.r.t to AOTAutogradCache and FXGraphCache (#150423 )	2025-04-04 01:55:13 +00:00
_higher_order_ops	[invoke_subgraph] Preserve node meta (#150782 )	2025-04-08 16:57:39 +00:00
_inductor	[Inductor] fix alignement assumption for fallback (#150777 )	2025-04-08 18:49:44 +00:00
_lazy
_library	[custom_ops][perf] Move expensive pytree traversals of tensors to C++ (#148555 )	2025-04-01 18:45:48 +00:00
_logging	[export] Beef up guard_added logs (#149465 )	2025-03-20 23:02:07 +00:00
_numpy
_prims	Support torch.compile rng selective activation checkpointing with cudagraph (#146878 )	2025-02-28 00:47:03 +00:00
_prims_common	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
_refs	Fix torch.matmul related out dtype check (#148174 )	2025-04-08 17:00:28 +00:00
_strobelight	Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549 )	2025-02-22 03:44:53 +00:00
_subclasses	[aoti] Fix cannot determine truth value of Relation error when propagating unbacked symint in lowering (#150570 )	2025-04-03 20:06:15 +00:00
_vendor
accelerator	[Accelerator][Chore] Use existing `acc` when raising an error (#150829 )	2025-04-08 16:05:06 +00:00
amp	[MPS] grad scaler (#150255 )	2025-04-06 17:06:55 +00:00
ao	[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/torch/ao (#150826 )	2025-04-08 18:49:22 +00:00
autograd	Compare device name of profiler dynamically (#150396 )	2025-04-02 06:06:06 +00:00
backends	[ROCm] change preferred blas lib defaults (#150212 )	2025-03-29 03:33:07 +00:00
compiler	[dynamo] add reason field to torch.compiler.disable (#150341 )	2025-04-02 04:26:48 +00:00
contrib
cpu	[CPU Stream] Add noop for CPU stream record_event() and wait_event() (#145935 )	2025-02-20 18:50:55 +00:00
csrc	Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095 )"	2025-04-08 17:10:36 +00:00
cuda	Remove redundant code in cuda/__init__.py (#150529 )	2025-04-08 15:03:21 +00:00
distributed	Support having no metadata file for HuggingFaceStorageReader (#150701 )	2025-04-07 22:10:39 +00:00
distributions	[typing] Add type hints to `__init__` methods in `torch.distributions`. (#144197 )	2025-04-06 17:50:35 +00:00
export	[export] specialize for aten.to (#149235 )	2025-04-03 05:20:10 +00:00
fft
func
futures	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
fx	[MTIA] Map names to operand indices when folding submodules (#150692 )	2025-04-06 03:11:14 +00:00
jit	scriptfunction: Make sure we have valid __name__ and __qualname__ (#147906 )	2025-02-28 23:25:47 +00:00
legacy
lib	[codemod] Fix missing field initializer in caffe2/torch/lib/libshm/manager.cpp +1 (#148393 )	2025-03-04 04:20:04 +00:00
linalg	Implement gradient for the `residuals` of `torch.linalg.lstsq` (#148526 )	2025-03-10 12:35:09 +00:00
masked	Use variadic length tuple for `torch.masked.DimOrDims` (#149870 )	2025-03-31 07:06:58 +00:00
monitor
mps	[MPS] Make `torch.mps.compile_shader` public (#148972 )	2025-03-11 20:20:58 +00:00
mtia	[MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340 )	2025-03-18 15:15:12 +00:00
multiprocessing
nested	[aotd] Guess tangents stride as output strides (#144579 )	2025-03-20 15:41:36 +00:00
nn	Do not depend on numpy during the import (#150816 )	2025-04-08 18:12:53 +00:00
onnx	[export] refactor _Dim into Dim (#149891 )	2025-03-28 06:19:03 +00:00
optim	[MPS] grad scaler (#150255 )	2025-04-06 17:06:55 +00:00
package	Remove code for Python < 3.9 (#147097 )	2025-02-14 03:22:49 +00:00
profiler	[BE][Ez]: Use itertools.chain.from_iterable when possible (#148190 )	2025-03-06 20:37:06 +00:00
quantization
signal
sparse	Fix spelling (#149277 )	2025-03-20 01:02:32 +00:00
special
testing	add batching rule for `torch.Tensor.scatter_add_` (#150543 )	2025-04-08 18:00:10 +00:00
utils	Remove torch functions that do not support device arguments from _device_constructor (#150290 )	2025-04-08 15:13:55 +00:00
xpu	xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled (#147431 )	2025-02-24 01:35:54 +00:00
__config__.py
__future__.py
__init__.py	Fix #149806 : Fix path lookup in _preload_cuda_deps (#149808 )	2025-03-25 23:03:47 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py	[invoke_subgraph] Lazy backward (#150666 )	2025-04-07 22:44:43 +00:00
_jit_internal.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py	Fix torch.matmul related out dtype check (#148174 )	2025-04-08 17:00:28 +00:00
_namedtensor_internals.py
_ops.py	Add `Any` return annotation to `__getattr__` methods that return a union of types. (#150204 )	2025-04-02 05:25:07 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py	Add type hints to `_tensor_docs.add_docstr_all` (#150715 )	2025-04-06 22:25:34 +00:00
_tensor_str.py	add `torch.float4_e2m1fn_x2` to PyTorch (#148791 )	2025-03-27 17:32:20 +00:00
_tensor.py	Revert "Fix non-bitwise type annotations for Tensor operators (see #145838 ) (#146845 )"	2025-02-18 19:01:27 +00:00
_thread_safe_fork.py
_torch_docs.py	Optimize `torch.equal` description (#149618 )	2025-03-21 03:44:49 +00:00
_utils_internal.py	[ROCm] OCP FP8 Support for new GPUs (#146632 )	2025-02-24 22:47:52 +00:00
_utils.py	Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786 )	2025-03-06 12:04:32 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py	Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759 )	2025-02-25 23:51:12 +00:00
CMakeLists.txt	Add new dependences for gen_pyi.py (#150391 )	2025-04-03 14:18:18 +00:00
custom_class_detail.h
custom_class.h	Remove unneeded Clang-tidy suppression (#148246 )	2025-03-01 16:51:54 +00:00
extension.h
functional.py	Fix invalid nested int guarding in broadcast_shapes() (#145957 )	2025-03-11 00:53:13 +00:00
hub.py	[BE][CI][Easy] bump `ruff` to 0.9.0: long statements in docstrings (#146509 )	2025-02-24 19:56:08 +00:00
library.h	[pytorch] add experimental TORCH_LIBRARY_THREAD_UNSAFE_LAZY_INIT (#150537 )	2025-04-03 22:36:17 +00:00
library.py	[Docs] Make `torch.Library`'s `kind` have no default value to be consistent with the code (#149390 )	2025-03-21 04:42:10 +00:00
overrides.py	Use Python 3.9 typing (#148157 )	2025-03-04 03:09:55 +00:00
py.typed
quasirandom.py
random.py
README.md	Rename README.txt to README.md (#149811 )	2025-03-24 22:33:33 +00:00
return_types.py
script.h
serialization.py	Move get accelerator to use build time flags when possible (#146098 )	2025-03-10 13:17:58 +00:00
storage.py	add `torch.float4_e2m1fn_x2` to PyTorch (#148791 )	2025-03-27 17:32:20 +00:00
torch_version.py
types.py
version.py.tpl

README.md

Note [TH abstraction violation]


TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.