pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Laith Sakka 39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93		2025-05-28 03:41:26 +00:00
..
autoheuristic
codegen	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 )	2025-05-28 03:41:26 +00:00
compile_worker	torch.compile: Supress stdout / stderr output from subprocesses when local (#153837 )	2025-05-22 05:49:43 +00:00
fx_passes	[EASY] used guard_or_false instead of guard_sizes_oblivious in pointless_view (#154154 )	2025-05-26 21:59:21 +00:00
kernel	[Cutlass] Support float8_e4m3fn GEMM (#153890 )	2025-05-22 08:37:33 +00:00
package	[export] Move PT2ArchiveWriter/Reader to torch/export (#153795 )	2025-05-23 19:04:36 +00:00
runtime	[AOTI] Add a multi_arch_kernel_binary option (#154413 )	2025-05-28 01:20:38 +00:00
__autotune_main__.py	Improve subproc autotuning implementation (#149700 )	2025-03-28 01:06:39 +00:00
__init__.py	Add optional device index to AOTIModelPackageLoader (#152093 )	2025-05-04 11:40:12 +00:00
analyze_preserves_zero_mask.py	Revert two recent prologue prs (#151013 )	2025-04-10 23:48:41 +00:00
aoti_eager.py
async_compile.py	Pass inductor config for static cuda launcher to workers (#153382 )	2025-05-14 20:01:32 +00:00
autotune_process.py	[inductor][cutlass backend] Add 2 stage autotuning aka prescreening (#153335 )	2025-05-23 17:12:25 +00:00
bounds.py	[inductor] Refactor op handlers part 5 (#146257 )	2025-02-08 18:00:30 +00:00
choices.py	Reland "Introduce new template heuristic for triton autotune configs" (#147452 )	2025-03-26 15:47:06 +00:00
codecache.py	[AOTI] Support multi-arch when using package_cpp_only (#154414 )	2025-05-28 01:20:38 +00:00
comm_analysis.py
comm_lowering.py	Fix an issue where functional collectives don't force fx stride on inputs when compiled (#146467 )	2025-02-10 19:15:49 +00:00
comms.py	Make assertion about pass callable print the bad pass (#152654 )	2025-05-05 18:07:43 +00:00
compile_fx_async.py	Use correct boxed_forward_device_index when running `CompiledFxGraph.post_compile` (#148130 )	2025-03-23 02:57:58 +00:00
compile_fx_ext.py	Revert "Re-enable FakeTensor caching for SymInts (#152662 )"	2025-05-26 17:13:22 +00:00
compile_fx_subproc.py	async fx compile (#146135 )	2025-03-19 14:07:51 +00:00
compile_fx.py	Update provenance tracking doc (#154062 )	2025-05-23 17:09:52 +00:00
compiler_bisector.py	Add a couple config options to compiler bisector (#148450 )	2025-03-04 23:23:21 +00:00
config.py	[AOTI] Add a multi_arch_kernel_binary option (#154413 )	2025-05-28 01:20:38 +00:00
constant_folding.py	Fix constant folding cloning constants (#152273 )	2025-05-01 17:34:39 +00:00
cpp_builder.py	[AOTI] Support multi-arch when using package_cpp_only (#154414 )	2025-05-28 01:20:38 +00:00
cpu_vec_isa.py	Allow to set custom PYTHONPATH for torch.inductor (#152832 )	2025-05-15 06:35:41 +00:00
cudagraph_trees.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
cudagraph_utils.py	[CUDAGraph] support meta tensor (#150478 )	2025-04-02 07:21:50 +00:00
custom_graph_pass.py
debug.py	Rename the provenance tracing artifact name for kernel <-> post_grad nodes mapping (#154046 )	2025-05-22 19:20:56 +00:00
decomposition.py	Revert "Improve torch.ops typing (#153558 )"	2025-05-19 23:32:36 +00:00
dependencies.py	[Graph Partition] Support symbol inputs (#149458 )	2025-03-26 17:21:30 +00:00
dtype_propagation.py	Remove libdevice ops in inductor (#151562 )	2025-04-17 22:18:00 +00:00
exc.py
extern_node_serializer.py	Back out "[AOTI] Always use oss schema for ExternKernelNodes serialization" (#151026 )	2025-04-10 22:36:35 +00:00
freezing_utils.py	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
freezing.py	[cudagraphs] Fix issue in collecting static_input_idxs (#152287 )	2025-04-30 03:24:05 +00:00
fuzzer.py	[AOTI][reland] Add an option to specify custom op C shim (#153968 )	2025-05-21 15:57:57 +00:00
fx_utils.py	Scheduler Flops refactor (#152708 )	2025-05-09 19:01:43 +00:00
graph.py	cpp_wrapper: build non-performance-sensitive code at O1 (#148773 )	2025-05-23 00:51:20 +00:00
hooks.py
index_propagation.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
inductor_prims.py	[inductor] lowering for fractional_max_pool3d (#148630 )	2025-05-22 16:06:29 +00:00
ir.py	Revert "[Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371 )"	2025-05-27 20:39:09 +00:00
jagged_lowerings.py
loop_body.py	[ez] fix typo in comment (#151755 )	2025-04-21 14:52:39 +00:00
lowering.py	[Inductor] Allow passing in custom lowering dict to register_lowering() (#154344 )	2025-05-27 01:35:26 +00:00
memory.py	[Graph Partition] reorder for minimal number of partitions (#151968 )	2025-04-29 17:17:16 +00:00
metrics.py	[Inductor] Support parallel reduction for GroupNorm (#144020 )	2025-03-01 17:11:50 +00:00
mkldnn_ir.py	Revert "[Inductor] Improve typing, and prepare for ABI-compatible AOTI C-shim dispatching (#154371 )"	2025-05-27 20:39:09 +00:00
mkldnn_lowerings.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
mock_cache.py
ops_handler.py	Remove libdevice ops in inductor (#151562 )	2025-04-17 22:18:00 +00:00
optimize_indexing.py
output_code.py	codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293 )	2025-05-16 17:41:36 +00:00
pattern_matcher.py	Rename node.meta["arg_kwarg_vals"] to node.meta["eager_input_vals"] (#148092 )	2025-04-02 13:18:04 +00:00
quantized_lowerings.py	Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031 )	2025-03-18 01:33:13 +00:00
remote_cache.py	[Indcutor Remote Cache] Raise an exception if redis module is required but not available (#151779 )	2025-04-26 11:21:54 +00:00
scheduler.py	update mutation renames (#153895 )	2025-05-22 14:54:39 +00:00
script.ld
select_algorithm.py	Make inductor UT to be generic (#154196 )	2025-05-24 02:47:46 +00:00
sizevars.py	[aoti] fix corner case in unbacked replacements for atomically_apply_size_hint (#153768 )	2025-05-22 02:05:37 +00:00
standalone_compile.py	Add logging for guard miss failure (#153125 )	2025-05-09 16:51:04 +00:00
subgraph_lowering.py	[inductor] Refactor op handlers part 5 (#146257 )	2025-02-08 18:00:30 +00:00
template_heuristics.py	[Inductor] Add Additional Configs for persistent+TMA version of Triton mm and addmm (#150587 )	2025-04-23 18:21:35 +00:00
test_case.py	[Inductor] be able to disable cache for test (#141195 )	2025-01-24 19:15:55 +00:00
test_operators.py	[CI] Fix `GPUTests.test_scheduler_vertical_fusion1` (#151166 )	2025-04-13 00:41:51 +00:00
triton_bundler.py	Keep raw cubin file around in case it gets deleted underneath us (#153064 )	2025-05-08 14:29:19 +00:00
utils.py	[aoti] Initial Metal support (#153959 )	2025-05-23 05:45:35 +00:00
virtualized.py	[inductor] Add a helper for convert index_dtype to torch dtype (#149531 )	2025-03-20 21:33:29 +00:00
wrapper_benchmark.py	[Inductor][NCU] Add kernel name filtering, and allow custom metrics (#150872 )	2025-05-04 20:49:19 +00:00