pytorch/torch
rzou c41fbb4f78 Change arg_kwarg_vals propagation strategy (#148046)
Instead of always propagating arg_kwarg_vals in _COPY_META_FIELDS, we
special-case the pattern matcher to propagate arg_kwarg_vals when
it sees triton_kernel_wrapper_functional.

The strategy is:
1) trace out the replacement graph with arg_kwarg_vals (which have accurate eager-mode metadata)
2) trace out the replacement graph with vals (which have the accurate Inductor metadata)
3) Propagate the arg_kwarg_vals from the first graph to the second.
4) Use the second graph as the replacement graph.

The strategy is this because we want to extend this to handle
auto_functionalized later up in the stack.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148046
Approved by: https://github.com/eellison
2025-04-02 13:17:52 +00:00
..
_awaits
_C [custom_ops][perf] Move expensive pytree traversals of tensors to C++ (#148555) 2025-04-01 18:45:48 +00:00
_C_flatbuffer
_custom_op
_decomp Remove aten.elu core ATen decomp because it is now core ATen (#149780) 2025-03-25 01:59:57 +00:00
_dispatch [BE][PYFMT] migrate PYFMT for torch._dynamo to ruff format (#144549) 2025-02-28 03:03:53 +00:00
_dynamo [dynamo] add dynamo disable reasons to codebase (#150440) 2025-04-02 04:26:48 +00:00
_export [pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257) 2025-04-01 10:40:43 +00:00
_functorch Unify on dynamo_compile as the overall wait counter (#150293) 2025-04-01 08:55:51 +00:00
_higher_order_ops [dynamo] add dynamo disable reasons to codebase (#150440) 2025-04-02 04:26:48 +00:00
_inductor Change arg_kwarg_vals propagation strategy (#148046) 2025-04-02 13:17:52 +00:00
_lazy
_library [custom_ops][perf] Move expensive pytree traversals of tensors to C++ (#148555) 2025-04-01 18:45:48 +00:00
_logging [export] Beef up guard_added logs (#149465) 2025-03-20 23:02:07 +00:00
_numpy
_prims Support torch.compile rng selective activation checkpointing with cudagraph (#146878) 2025-02-28 00:47:03 +00:00
_prims_common
_refs [export] fix stft decomp and making it consistent with cpp impl. (#149232) 2025-03-19 18:40:35 +00:00
_strobelight
_subclasses [invoke_subgraph] Do not cache fake tensors for AOTDispatcher first pass (#150450) 2025-04-02 02:31:54 +00:00
_vendor
accelerator Move get accelerator to use build time flags when possible (#146098) 2025-03-10 13:17:58 +00:00
amp [MAIA] [Autocast] Enable autocast on MAIA device (#148511) 2025-03-18 03:46:22 +00:00
ao [Quant][PT2E] add a lowering pass for x86 backend (#149708) 2025-04-01 17:32:41 +00:00
autograd Compare device name of profiler dynamically (#150396) 2025-04-02 06:06:06 +00:00
backends [ROCm] change preferred blas lib defaults (#150212) 2025-03-29 03:33:07 +00:00
compiler [dynamo] add reason field to torch.compiler.disable (#150341) 2025-04-02 04:26:48 +00:00
contrib
cpu
csrc cpp_wrapper: precompile a few more commonly used headers, and improve RAIIPyObject interface (#149350) 2025-04-02 09:54:27 +00:00
cuda [ROCm][TunableOp] Stricter unit tests for online and offline tuning (#150142) 2025-03-31 04:12:08 +00:00
distributed [dynamo] add dynamo disable reasons to codebase (#150440) 2025-04-02 04:26:48 +00:00
distributions [BE][PYFMT] migrate PYFMT for torch.{distributed,distributions} to ruff format (#144547) 2025-02-28 07:35:56 +00:00
export [dynamo] add dynamo disable reasons to codebase (#150440) 2025-04-02 04:26:48 +00:00
fft
func
futures
fx Change arg_kwarg_vals propagation strategy (#148046) 2025-04-02 13:17:52 +00:00
jit scriptfunction: Make sure we have valid __name__ and __qualname__ (#147906) 2025-02-28 23:25:47 +00:00
legacy
lib [codemod] Fix missing field initializer in caffe2/torch/lib/libshm/manager.cpp +1 (#148393) 2025-03-04 04:20:04 +00:00
linalg Implement gradient for the residuals of torch.linalg.lstsq (#148526) 2025-03-10 12:35:09 +00:00
masked Use variadic length tuple for torch.masked.DimOrDims (#149870) 2025-03-31 07:06:58 +00:00
monitor
mps [MPS] Make torch.mps.compile_shader public (#148972) 2025-03-11 20:20:58 +00:00
mtia [MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340) 2025-03-18 15:15:12 +00:00
multiprocessing
nested [aotd] Guess tangents stride as output strides (#144579) 2025-03-20 15:41:36 +00:00
nn Fix documentation build errors caused by unsupported section titles (#150205) 2025-03-31 04:27:44 +00:00
onnx [export] refactor _Dim into Dim (#149891) 2025-03-28 06:19:03 +00:00
optim Convert Tensor lr to 0-dim as needed for the optimizer to normally work (#145674) 2025-03-17 23:07:05 +00:00
package
profiler [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190) 2025-03-06 20:37:06 +00:00
quantization
signal
sparse Fix spelling (#149277) 2025-03-20 01:02:32 +00:00
special
testing [Reland] Launch kernel on current stream & remove record_stream entirely (#150398) 2025-04-01 16:46:07 +00:00
utils [ROCm][Windows] Fix torchvision build with ROCm 6.4 on windows (#150180) 2025-04-02 00:35:47 +00:00
xpu
__config__.py
__future__.py
__init__.py Fix #149806 : Fix path lookup in _preload_cuda_deps (#149808) 2025-03-25 23:03:47 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py [dynamic shapes] add backed_size_oblivious option (#148696) 2025-03-11 21:52:34 +00:00
_jit_internal.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_linalg_utils.py
_lobpcg.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
_lowrank.py
_meta_registrations.py enable torch.compile for torch._scaled_mm nvfp4 recipe (#150462) 2025-04-02 01:08:40 +00:00
_namedtensor_internals.py
_ops.py Add Any return annotation to __getattr__ methods that return a union of types. (#150204) 2025-04-02 05:25:07 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py
_tensor_str.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
_tensor.py
_thread_safe_fork.py
_torch_docs.py Optimize torch.equal description (#149618) 2025-03-21 03:44:49 +00:00
_utils_internal.py
_utils.py Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786) 2025-03-06 12:04:32 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py
CMakeLists.txt
custom_class_detail.h
custom_class.h Remove unneeded Clang-tidy suppression (#148246) 2025-03-01 16:51:54 +00:00
extension.h
functional.py Fix invalid nested int guarding in broadcast_shapes() (#145957) 2025-03-11 00:53:13 +00:00
hub.py
library.h
library.py [Docs] Make torch.Library's kind have no default value to be consistent with the code (#149390) 2025-03-21 04:42:10 +00:00
overrides.py Use Python 3.9 typing (#148157) 2025-03-04 03:09:55 +00:00
py.typed
quasirandom.py
random.py
README.md Rename README.txt to README.md (#149811) 2025-03-24 22:33:33 +00:00
return_types.py
script.h
serialization.py Move get accelerator to use build time flags when possible (#146098) 2025-03-10 13:17:58 +00:00
storage.py add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]


TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.