pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Colin Peppler fe285b9560 [aoti] fix corner case in unbacked replacements for atomically_apply_size_hint (#153768 ) ## PR There are a few cases that my previous PR (#153220) didn't cover. 1. The LHS/RHS matters. Today, if you do `torch._check(lhs == rhs)` then it will show up as a deferred runtime assert with `Eq(lhs, rhs)`. 2. There can be transitive replacements. For example, expr1 -> expr2 -> u0. `test_size_with_unbacked_add_expr_transitive` tests for this. 3. An unbacked symint expr may not have a replacement that's purely a symbol, for instance, it could be another expression. `test_size_with_unbacked_add_and_mul_expr` tests for this. ## Device assertion msg ``` /tmp/tmp07mu50tx/6y/c6ym2jzadwfigu3yexredb7qofviusz3p7ozcdjywvayhxgcqxkp.py:40: unknown: block: [8681,0,0], thread: [4,0,0] Assertion `index out of bounds: 0 <= tl.broadcast_to(tmp13, [XBLOCK]) < ks0` failed. ... /tmp/tmp07mu50tx/6y/c6ym2jzadwfigu3yexredb7qofviusz3p7ozcdjywvayhxgcqxkp.py:40: unknown: block: [8681,0,0], thread: [6,0,0] Assertion `index out of bounds: 0 <= tl.broadcast_to(tmp13, [XBLOCK]) < ks0` failed. ``` ## Autotuning code setup This is the autotuning code for a concat kernel which takes input tensors (`in_buf`) and writes them to the (`out_buf`). It's important to note the size of `in_buf0` is the same as `in_buf1` don't match along dim=0. This is bad because all concat inputs must share the same size for each dim except for the concat dim (here that's dim=1). ``` in_buf0 = generate_example_value(size=(u1 + s0, 256)) # concrete size is (17900, 256) in_buf1 = generate_example_value(size=(u0, 10)) # concrete size is (8192, 10) ... out_buf = generate_example_value(size=(u1 + s0, 266)) # concrete size is (17900, 256+10) triton_poi_fused_cat_1.run(in_buf0, in_buf1, ..., out_buf, xnumel=(u1 + s0) * 266 ...) ``` If we look into the kernel code, you'll see that `tmp9` loads `in_buf1` (our incorrectly shaped input tensor). There is also a mask to prevent OOB loads. - `tmp6` makes sure we're only loading with the `xindex` from 256 to 264. - `xmask` makes sure we're only loading with the `xindex` within `xnumel`. - `tmp6 & xmask` together is essentially checking `0 ≤ x0 < u1 + s0` and `256 ≤ x1 < 264`. The mask logic is correct, however, `in_buf1` has the shape `[8192, 10]` this means any load where `8192 ≤ x0 < u1 + s0` will be an OOB load. ``` def triton_poi_fused_cat_1(in_buf0, in_buf1, ... out_buf, xnumel, XBLOCK): xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK) xmask = xindex < xnumel x0 = (xindex % 264) x1 = xindex // 264 ... tmp6 = x0 >= tl.full([1], value=256) tmp9 = tl.load(in_buf1 + (x1), tmp6 & xmask) # device assertion is thrown here tl.device_assert(((0 <= tl.broadcast_to(tmp13, [XBLOCK])) & (tl.broadcast_to(tmp13, [XBLOCK]) < ks0)) \| ~(xmask & tmp6), "index out of bounds: 0 <= tl.broadcast_to(tmp13, [XBLOCK]) < ks0") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153768 Approved by: https://github.com/jingsh		2025-05-22 02:05:37 +00:00
..
ao/sparsity	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
autograd
backends/xeon
benchmark_utils	PEP585 update - test (#145176 )	2025-01-22 04:48:28 +00:00
bottleneck_test
cpp	[nativert] Move GraphSignature to pytorch core (#152969 )	2025-05-20 21:49:56 +00:00
cpp_api_parity
cpp_extensions	Remove janky (though at times useful) dlclose test (#153975 )	2025-05-20 23:26:42 +00:00
custom_backend	[Cmake] Make PyTorch buildable by CMake-4.x (#150203 )	2025-03-29 01:39:13 +00:00
custom_operator	[Cmake] Make PyTorch buildable by CMake-4.x (#150203 )	2025-03-29 01:39:13 +00:00
distributed	Revert "[CI][CUDA] Move cu118 distributed pull jobs to cu126, move cu124-sm75 to cu126-sm75 (#151594 )"	2025-05-21 01:45:20 +00:00
distributions	Fix support of MixtureSameFamily [bugfix]. (#151317 )	2025-05-14 19:24:36 +00:00
dynamo	Add `flag _metrics_log_runtime` to disable runtime metric logging by default (#153506 )	2025-05-22 01:02:11 +00:00
dynamo_expected_failures	remove TestCustomOp.test_impl_device_cpu from dynamo expected failures (#154049 )	2025-05-21 23:20:30 +00:00
dynamo_skips	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 )	2025-04-23 09:12:13 +00:00
edge	Fix some CMake issues (#153686 )	2025-05-19 00:31:34 +00:00
error_messages
expect	Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 )"	2025-05-14 20:53:49 +00:00
export	[export] Remove unused constants (#153800 )	2025-05-20 03:15:27 +00:00
forward_backward_compatibility	API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ (#150536 )	2025-05-14 23:36:53 +00:00
functorch	[map] add inductor support by lowering to while_loop (#150971 )	2025-05-21 22:19:47 +00:00
fx	[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/ (#149595 )	2025-04-03 23:50:13 +00:00
higher_order_ops	[hop_schema] support gen_schema for invoke_subgraph (#152984 )	2025-05-21 18:55:46 +00:00
inductor	[aoti] fix corner case in unbacked replacements for atomically_apply_size_hint (#153768 )	2025-05-22 02:05:37 +00:00
inductor_expected_failures	[dynamo] Support Tensor subclass that has dynamic attributes or calls `Parameter.__torch_function__` (#149482 )	2025-04-02 20:56:43 +00:00
inductor_skips	[BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo (#145307 )	2025-01-27 18:12:39 +00:00
jit	[JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value*> (#153645 )	2025-05-19 21:04:59 +00:00
jit_hooks	[Cmake] Make PyTorch buildable by CMake-4.x (#150203 )	2025-03-29 01:39:13 +00:00
lazy
mobile	Fix some CMake issues (#153686 )	2025-05-19 00:31:34 +00:00
nn	[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101 )	2025-05-20 20:19:03 +00:00
onnx	[ONNX] Support float4 (#151069 )	2025-05-18 03:19:35 +00:00
optim	Add lr_lambda type check in MultiplicativeLR (#151973 )	2025-04-29 08:21:41 +00:00
package	Remove outdated test skipif conditions for Python3.9 (#146144 )	2025-01-31 19:01:04 +00:00
profiler	Add memory reporting for XPU to Memory Profiler (#152842 )	2025-05-21 01:19:19 +00:00
quantization	[Quant][X86] add an op to compute uint8 batch norm 2d (#152811 )	2025-05-16 06:13:40 +00:00
scripts
strobelight/examples	Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549 )	2025-02-22 03:44:53 +00:00
test_img
torch_np	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
typing	Revert "Fix non-bitwise type annotations for Tensor operators (see #145838 ) (#146845 )"	2025-02-18 19:01:27 +00:00
xpu	[Intel GPU] scalar tensor case handling in addmm, baddmm (#153051 )	2025-05-21 12:24:37 +00:00
_test_bazel.py
allowlist_for_publicAPI.json	Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen` (#150626 )	2025-05-17 06:21:41 +00:00
bench_mps_ops.py	[MPS][Testing] Benchmark reduction ops (#150452 )	2025-04-02 01:06:27 +00:00
conftest.py	Apply ruff fixes to tests (#146140 )	2025-02-04 05:41:01 +00:00
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py	Support independent builds for cpp extension tests + apply to libtorch_agnostic tests (#153264 )	2025-05-20 19:18:09 +00:00
simulate_nccl_errors.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
slow_tests.json	Update slow tests (#153815 )	2025-05-19 11:15:25 +00:00
test_accelerator.py	[Easy] Fix the function signature of torch.Event (#151221 )	2025-04-26 13:51:56 +00:00
test_ao_sparsity.py
test_appending_byte_serializer.py	Check integrity of bytes in AppendingByteSerializer (#152139 )	2025-04-26 18:10:58 +00:00
test_autocast.py	Enable TemporaryFileName tests on Windows (#146311 )	2025-02-07 06:06:18 +00:00
test_autograd_fallback.py
test_autograd.py	Fix test_side_stream_backward_overlap flakiness (#153963 )	2025-05-20 21:02:56 +00:00
test_autoload.py
test_binary_ufuncs.py	Fix lerp weight type promotion (#141117 )	2025-01-24 01:18:20 +00:00
test_bundled_images.py
test_bundled_inputs.py
test_ci_sanity_check_fail.py
test_comparison_utils.py
test_compile_benchmark_util.py
test_complex.py
test_content_store.py	torch.utils._content_store: fix error in hash_storage on XPU (#147785 )	2025-02-26 23:57:59 +00:00
test_cpp_api_parity.py	Enable C++ API parity tests on AArch64 (#145370 )	2025-01-30 22:42:49 +00:00
test_cpp_extensions_aot.py	Make python_agnostic cpp extension tests standalone (#153274 )	2025-05-20 19:18:09 +00:00
test_cpp_extensions_jit.py	xpu: get xpu arch flags at runtime in cpp_extensions (#152192 )	2025-05-09 05:43:50 +00:00
test_cpp_extensions_mtia_backend.py	Revert "Generalize poison fork logic for each device backend (#144664 )"	2025-04-10 21:02:14 +00:00
test_cpp_extensions_open_device_registration.py	[Openreg][PrivateUse1] Improve openreg module capabilities (#151000 )	2025-04-12 17:21:35 +00:00
test_cpp_extensions_stream_and_event.py	[Easy] Add more check for elapsedTime of torch.xxx.Event and torch.Event (#151404 )	2025-04-25 20:15:04 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py	[CUDA] try to abate some flakiness in `test_stream_event_nogil` (#148796 )	2025-03-12 19:12:50 +00:00
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py	Remove outdated skipIfRocmVersionLessThan decorations (#148941 )	2025-03-11 18:37:40 +00:00
test_cuda_sanitizer.py
test_cuda_trace.py
test_cuda.py	make use_mem_pool threadlocal (#153356 )	2025-05-13 00:16:07 +00:00
test_custom_ops.py	Inductor respects exact strides on custom ops by default (#150511 )	2025-05-03 00:02:24 +00:00
test_dataloader.py	Enable more nightly tests on s390x (#148452 )	2025-03-18 16:09:39 +00:00
test_datapipe.py	Remove unactivated test (#146233 )	2025-02-04 05:26:04 +00:00
test_decomp.py	Update ruff linter for PEP585 (#147540 )	2025-02-22 04:45:17 +00:00
test_deploy.py
test_determination.py
test_dispatch.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
test_dlpack.py
test_dynamic_shapes.py	Support using SymInt shapes for torch.baddbmm no-broadcast case (#153112 )	2025-05-08 21:34:24 +00:00
test_expanded_weights.py
test_extension_utils.py	Move privateuse1 test out of test_utils and make them serial (#145380 )	2025-01-23 00:31:39 +00:00
test_fake_tensor.py	Revert "Fix fake tensor caching when output has unbacked (#153034 )"	2025-05-20 06:02:38 +00:00
test_file_check.py
test_flop_counter.py	Build RowwiseScaledMM.cu for SM89 (#145676 )	2025-02-01 11:44:58 +00:00
test_foreach.py	Synchronize in foreach tests after profiling (#152857 )	2025-05-06 00:56:48 +00:00
test_function_schema.py
test_functional_autograd_benchmark.py	Enable Windows tests (#146666 )	2025-02-08 00:55:20 +00:00
test_functional_optim.py
test_functionalization_of_rng_ops.py
test_functionalization.py
test_futures.py
test_fx_experimental.py	PEP585: Add noqa to necessary tests (#146391 )	2025-02-12 15:29:50 +00:00
test_fx_passes.py
test_fx_reinplace_pass.py
test_fx.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
test_hop_infra.py	Support torch.compile rng selective activation checkpointing with cudagraph (#146878 )	2025-02-28 00:47:03 +00:00
test_hub.py
test_import_stats.py
test_indexing.py	[ROCm] Improve backwards indexing when stride is not one (#147630 )	2025-03-11 19:02:48 +00:00
test_itt.py
test_jit_autocast.py	PEP585 update - test (#145176 )	2025-01-22 04:48:28 +00:00
test_jit_disabled.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
test_jit_fuser.py
test_jit_legacy.py
test_jit_llga_fuser.py
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py	PEP585 update - test (#145176 )	2025-01-22 04:48:28 +00:00
test_jit.py	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 )	2025-03-18 00:46:07 +00:00
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py
test_license.py	Fix license check for setuptools>=77 (#151158 )	2025-04-12 13:41:12 +00:00
test_linalg.py	`torch.tensordot`: performance improvements when contracting to a scalar. (#145936 )	2025-05-13 10:57:30 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py
test_matmul_cuda.py	[CUDA][cuBLAS][cuBLASLt] avoid polluting prefer cuBLAS/Lt setting across tests (#153655 )	2025-05-20 16:18:35 +00:00
test_meta.py	[BE] Migrate dtype_abbrs into one location (#152229 )	2025-04-28 03:52:47 +00:00
test_metal.py
test_mkl_verbose.py
test_mkldnn_fusion.py
test_mkldnn_verbose.py
test_mkldnn.py	Support fp8 output of _scaled_mm for CPU (#153600 )	2025-05-22 01:15:39 +00:00
test_mobile_optimizer.py
test_model_exports_to_core_aten.py	[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/ (#149595 )	2025-04-03 23:50:13 +00:00
test_module_tracker.py
test_modules.py	Disable slow gradcheck for nn.Transformer ModuleInfo (#145531 )	2025-01-25 00:58:03 +00:00
test_monitor.py
test_mps.py	[MPS] Fix float64 scalar tensor handling (#153582 )	2025-05-15 05:15:14 +00:00
test_multiprocessing_spawn.py	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 )	2025-02-28 05:53:19 +00:00
test_multiprocessing.py	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 )	2025-02-28 05:53:19 +00:00
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py
test_nestedtensor.py	Rewrite autograd producer consumer stream sync logic (#151079 )	2025-05-16 15:42:22 +00:00
test_nn.py	[CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss` (#152745 )	2025-05-07 22:01:18 +00:00
test_nnapi.py
test_numba_integration.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
test_numpy_interop.py
test_openmp.py
test_openreg.py	[OpenReg] Add _lazy_init and rng_state support for OpenReg (#151914 )	2025-05-04 09:42:08 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py	Enable more nightly tests on s390x (#148452 )	2025-03-18 16:09:39 +00:00
test_ops_jit.py
test_ops.py	[ROCm] unkip test_non_standard_bool except for failings ops (#152956 )	2025-05-13 15:55:42 +00:00
test_optim.py	Fix test/test_optim.py error message. (#153076 )	2025-05-07 22:46:05 +00:00
test_out_dtype_op.py
test_overrides.py	[dynamo] Remove `traceable_tensor_subclasses`-related code (#151062 )	2025-04-15 03:55:35 +00:00
test_package.py
test_per_overload_api.py
test_prims.py
test_proxy_tensor.py	Support C++ statically_known_true (#151346 )	2025-04-18 06:42:12 +00:00
test_pruning_op.py
test_public_bindings.py	Remove `public_allowlist` from `TestPublicBindings.test_correct_module_names` and ensure private_allowlist-ed things are actually private (#145620 )	2025-01-27 17:30:02 +00:00
test_python_dispatch.py	Make DispatchKeySet serializable; add `__eq__` (#152732 )	2025-05-03 14:40:06 +00:00
test_pytree.py	[pytree] Register normal class to register_dataclass (#147752 )	2025-04-01 23:28:20 +00:00
test_quantization.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
test_reductions.py	Treat dim=[] same as dim=None (#153570 )	2025-05-20 22:44:29 +00:00
test_scatter_gather_ops.py	Reland fast gather and index implementation (#151917 )	2025-04-23 19:13:13 +00:00
test_schema_check.py
test_segment_reductions.py
test_serialization.py	Make torch.serialization.skip_data work with torch.load (#148018 )	2025-03-06 12:04:46 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py	[Quant] flip: throw runtime error for QUInt4x2 and QUInt2x4 input (#147430 )	2025-02-25 03:47:40 +00:00
test_show_pickle.py
test_sort_and_select.py	Fix linter F821 error (#146665 )	2025-02-08 07:19:37 +00:00
test_sparse_csr.py	[ROCm] improve sparse addmm, enable complex (#153262 )	2025-05-19 22:23:18 +00:00
test_sparse_semi_structured.py	API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ (#150536 )	2025-05-14 23:36:53 +00:00
test_sparse.py	[ROCm] improve sparse addmm, enable complex (#153262 )	2025-05-19 22:23:18 +00:00
test_spectral_ops.py	Re-add stft option to align window for center = false (#146379 )	2025-02-06 14:07:13 +00:00
test_stateless.py
test_static_runtime.py
test_subclass.py
test_sympy_utils.py	[Inductor] Expand Identity ops prior to block pattern matching (#146000 )	2025-02-08 18:11:53 +00:00
test_tensor_creation_ops.py	[Inductor] Add input value checking to randint meta function (#147191 )	2025-02-25 02:18:16 +00:00
test_tensorboard.py
test_tensorexpr_pybind.py
test_tensorexpr.py
test_testing.py	[Torch] Fix crash when comparing fp8 tensors that have more than 1 dimension (#153508 )	2025-05-15 08:41:46 +00:00
test_throughput_benchmark.py	Fix Throughputbenchmark issue (#144669 )	2025-01-26 03:37:20 +00:00
test_torch.py	convert guard_size_oblivious to runtime check in infer_size_impl (#148872 )	2025-05-13 00:32:28 +00:00
test_transformers_privateuse1.py	[OpenReg] Move SDPA to OpenReg from open_registration_extension.cpp (#153309 )	2025-05-13 03:49:19 +00:00
test_transformers.py	Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 )"	2025-05-14 20:53:49 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
test_utils_config_module.py	Add check that envvar configs are boolean (#145454 )	2025-02-05 19:40:10 +00:00
test_utils_filelock.py
test_utils.py	[utils] add try_import method for importing optional modules (#145528 )	2025-01-25 00:14:07 +00:00
test_view_ops.py	Fix overflow in checkInBoundsForStorage (#147352 )	2025-02-27 15:48:50 +00:00
test_vulkan.py
test_weak.py	Consistently use load_torchbind_test_lib in tests (#148082 )	2025-03-03 19:37:28 +00:00
test_xnnpack_integration.py	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 )	2025-02-04 19:07:04 +00:00
test_xpu.py	Record the XPU and XCCL build settings in the compiled binary (#147161 )	2025-05-20 09:21:39 +00:00