pytorch/torch
Shunting Zhang 901b02cf16 [Inductor] fix alignement assumption for fallback (#150777)
Inductor right now only works properly for fallback kernels producing aligned output.
When Inductor create layout for fallback kernel output, Inductor does not add the tensor offset to the layout [link](2a1e2b88ed/torch/_inductor/ir.py (L6935-L6941)). Thus unaligned output will be treated as aligned. Adding the offset to the layout directly does not work since that change the index expression in the generated kernel and we may 'double' applying the offset. Triton already considers the offset when passing in the data_ptr.

To solve this issue, we track the unaligned buffer names instead.

This potentially can fix the internal issues we are debugging here: https://fb.workplace.com/groups/1075192433118967/permalink/1618308128807392/

Differential Revision: [D72600784](https://our.internmc.facebook.com/intern/diff/D72600784)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150777
Approved by: https://github.com/eellison, https://github.com/jansel
2025-04-08 18:49:44 +00:00
..
_awaits
_C Revert "[fx] Move Node._prepend/Node._remove_from_list to C++ (#148261)" (#150542) 2025-04-03 21:15:38 +00:00
_C_flatbuffer
_custom_op
_decomp Fix torch.matmul related out dtype check (#148174) 2025-04-08 17:00:28 +00:00
_dispatch [BE][PYFMT] migrate PYFMT for torch._dynamo to ruff format (#144549) 2025-02-28 03:03:53 +00:00
_dynamo [dynamo] reconstruct functions decorated in the compiled region properly (#150645) 2025-04-08 17:32:46 +00:00
_export [export] raise when Dim.DYNAMIC 0/1 specializes (#150716) 2025-04-07 18:58:42 +00:00
_functorch Make CompileEventLogger more defensive w.r.t to AOTAutogradCache and FXGraphCache (#150423) 2025-04-04 01:55:13 +00:00
_higher_order_ops [invoke_subgraph] Preserve node meta (#150782) 2025-04-08 16:57:39 +00:00
_inductor [Inductor] fix alignement assumption for fallback (#150777) 2025-04-08 18:49:44 +00:00
_lazy
_library [custom_ops][perf] Move expensive pytree traversals of tensors to C++ (#148555) 2025-04-01 18:45:48 +00:00
_logging [export] Beef up guard_added logs (#149465) 2025-03-20 23:02:07 +00:00
_numpy
_prims Support torch.compile rng selective activation checkpointing with cudagraph (#146878) 2025-02-28 00:47:03 +00:00
_prims_common PEP585: More UP006 fixes (#146392) 2025-02-20 06:18:13 +00:00
_refs Fix torch.matmul related out dtype check (#148174) 2025-04-08 17:00:28 +00:00
_strobelight Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549) 2025-02-22 03:44:53 +00:00
_subclasses [aoti] Fix cannot determine truth value of Relation error when propagating unbacked symint in lowering (#150570) 2025-04-03 20:06:15 +00:00
_vendor
accelerator [Accelerator][Chore] Use existing acc when raising an error (#150829) 2025-04-08 16:05:06 +00:00
amp [MPS] grad scaler (#150255) 2025-04-06 17:06:55 +00:00
ao [Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/torch/ao (#150826) 2025-04-08 18:49:22 +00:00
autograd Compare device name of profiler dynamically (#150396) 2025-04-02 06:06:06 +00:00
backends [ROCm] change preferred blas lib defaults (#150212) 2025-03-29 03:33:07 +00:00
compiler [dynamo] add reason field to torch.compiler.disable (#150341) 2025-04-02 04:26:48 +00:00
contrib
cpu [CPU Stream] Add noop for CPU stream record_event() and wait_event() (#145935) 2025-02-20 18:50:55 +00:00
csrc Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095)" 2025-04-08 17:10:36 +00:00
cuda Remove redundant code in cuda/__init__.py (#150529) 2025-04-08 15:03:21 +00:00
distributed Support having no metadata file for HuggingFaceStorageReader (#150701) 2025-04-07 22:10:39 +00:00
distributions [typing] Add type hints to __init__ methods in torch.distributions. (#144197) 2025-04-06 17:50:35 +00:00
export [export] specialize for aten.to (#149235) 2025-04-03 05:20:10 +00:00
fft
func
futures PEP585: More UP006 fixes (#146392) 2025-02-20 06:18:13 +00:00
fx [MTIA] Map names to operand indices when folding submodules (#150692) 2025-04-06 03:11:14 +00:00
jit scriptfunction: Make sure we have valid __name__ and __qualname__ (#147906) 2025-02-28 23:25:47 +00:00
legacy
lib [codemod] Fix missing field initializer in caffe2/torch/lib/libshm/manager.cpp +1 (#148393) 2025-03-04 04:20:04 +00:00
linalg Implement gradient for the residuals of torch.linalg.lstsq (#148526) 2025-03-10 12:35:09 +00:00
masked Use variadic length tuple for torch.masked.DimOrDims (#149870) 2025-03-31 07:06:58 +00:00
monitor
mps [MPS] Make torch.mps.compile_shader public (#148972) 2025-03-11 20:20:58 +00:00
mtia [MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340) 2025-03-18 15:15:12 +00:00
multiprocessing
nested [aotd] Guess tangents stride as output strides (#144579) 2025-03-20 15:41:36 +00:00
nn Do not depend on numpy during the import (#150816) 2025-04-08 18:12:53 +00:00
onnx [export] refactor _Dim into Dim (#149891) 2025-03-28 06:19:03 +00:00
optim [MPS] grad scaler (#150255) 2025-04-06 17:06:55 +00:00
package Remove code for Python < 3.9 (#147097) 2025-02-14 03:22:49 +00:00
profiler [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190) 2025-03-06 20:37:06 +00:00
quantization
signal
sparse Fix spelling (#149277) 2025-03-20 01:02:32 +00:00
special
testing add batching rule for torch.Tensor.scatter_add_ (#150543) 2025-04-08 18:00:10 +00:00
utils Remove torch functions that do not support device arguments from _device_constructor (#150290) 2025-04-08 15:13:55 +00:00
xpu xpu: torch.xpu.get_arch_list() to return [] if xpu not compiled (#147431) 2025-02-24 01:35:54 +00:00
__config__.py
__future__.py
__init__.py Fix #149806 : Fix path lookup in _preload_cuda_deps (#149808) 2025-03-25 23:03:47 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py [invoke_subgraph] Lazy backward (#150666) 2025-04-07 22:44:43 +00:00
_jit_internal.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py Fix torch.matmul related out dtype check (#148174) 2025-04-08 17:00:28 +00:00
_namedtensor_internals.py
_ops.py Add Any return annotation to __getattr__ methods that return a union of types. (#150204) 2025-04-02 05:25:07 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py Add type hints to _tensor_docs.add_docstr_all (#150715) 2025-04-06 22:25:34 +00:00
_tensor_str.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
_tensor.py Revert "Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845)" 2025-02-18 19:01:27 +00:00
_thread_safe_fork.py
_torch_docs.py Optimize torch.equal description (#149618) 2025-03-21 03:44:49 +00:00
_utils_internal.py [ROCm] OCP FP8 Support for new GPUs (#146632) 2025-02-24 22:47:52 +00:00
_utils.py Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786) 2025-03-06 12:04:32 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759) 2025-02-25 23:51:12 +00:00
CMakeLists.txt Add new dependences for gen_pyi.py (#150391) 2025-04-03 14:18:18 +00:00
custom_class_detail.h
custom_class.h Remove unneeded Clang-tidy suppression (#148246) 2025-03-01 16:51:54 +00:00
extension.h
functional.py Fix invalid nested int guarding in broadcast_shapes() (#145957) 2025-03-11 00:53:13 +00:00
hub.py [BE][CI][Easy] bump ruff to 0.9.0: long statements in docstrings (#146509) 2025-02-24 19:56:08 +00:00
library.h [pytorch] add experimental TORCH_LIBRARY_THREAD_UNSAFE_LAZY_INIT (#150537) 2025-04-03 22:36:17 +00:00
library.py [Docs] Make torch.Library's kind have no default value to be consistent with the code (#149390) 2025-03-21 04:42:10 +00:00
overrides.py Use Python 3.9 typing (#148157) 2025-03-04 03:09:55 +00:00
py.typed
quasirandom.py
random.py
README.md Rename README.txt to README.md (#149811) 2025-03-24 22:33:33 +00:00
return_types.py
script.h
serialization.py Move get accelerator to use build time flags when possible (#146098) 2025-03-10 13:17:58 +00:00
storage.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]


TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.