pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Nikita Shulga dee016ceb7 [MPSInductor] Add `store_reduce` method (#150457 ) That restrict the store operation to 0th thread, which should be much better, shouldn't it (Though I don't observe it in the benchmark) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150457 Approved by: https://github.com/jansel, https://github.com/dcci ghstack dependencies: #150452		2025-04-02 05:12:49 +00:00
..
autoheuristic	PEP585 update - torch/_inductor/[_-i]* (#145137 )	2025-01-19 01:22:47 +00:00
codegen	[MPSInductor] Add `store_reduce` method (#150457 )	2025-04-02 05:12:49 +00:00
compile_worker	Improve subproc autotuning implementation (#149700 )	2025-03-28 01:06:39 +00:00
fx_passes	[Inductor] Hide reinplace_fsdp_all_gather pass behind skip_fsdp_hooks config (#150436 )	2025-04-01 22:56:06 +00:00
kernel	[AMD] [TRITON] [INDUCTOR] Add tl.assume to enable bufferops on AMD (#150373 )	2025-04-01 23:29:39 +00:00
package	[AOTI] Add num_runners to AOTIModelPackageLoader (#149364 )	2025-03-19 02:28:06 +00:00
runtime	Unify on dynamo_compile as the overall wait counter (#150293 )	2025-04-01 08:55:51 +00:00
__autotune_main__.py	Improve subproc autotuning implementation (#149700 )	2025-03-28 01:06:39 +00:00
__init__.py	Fix for AOTI + CUDAGraphs when calling from Python (#148601 )	2025-03-08 02:44:14 +00:00
analyze_preserves_zero_mask.py	Dont exclude constant_pad_nd in prologue fusion (#149947 )	2025-03-27 22:26:30 +00:00
aoti_eager.py	PEP585 update - torch/_inductor/[_-i]* (#145137 )	2025-01-19 01:22:47 +00:00
async_compile.py	Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054 )	2025-03-30 17:51:11 +00:00
autotune_process.py	[pytorch][triton] Warp specialization support in TritonTemplate for torchinductor (#148503 ) (#150122 )	2025-03-29 03:36:50 +00:00
bounds.py	[inductor] Refactor op handlers part 5 (#146257 )	2025-02-08 18:00:30 +00:00
choices.py	Reland "Introduce new template heuristic for triton autotune configs" (#147452 )	2025-03-26 15:47:06 +00:00
codecache.py	Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054 )	2025-03-30 17:51:11 +00:00
comm_analysis.py
comm_lowering.py	Fix an issue where functional collectives don't force fx stride on inputs when compiled (#146467 )	2025-02-10 19:15:49 +00:00
comms.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
compile_fx_async.py	Use correct boxed_forward_device_index when running `CompiledFxGraph.post_compile` (#148130 )	2025-03-23 02:57:58 +00:00
compile_fx_ext.py	async fx compile (#146135 )	2025-03-19 14:07:51 +00:00
compile_fx_subproc.py	async fx compile (#146135 )	2025-03-19 14:07:51 +00:00
compile_fx.py	Unify on dynamo_compile as the overall wait counter (#150293 )	2025-04-01 08:55:51 +00:00
compiler_bisector.py	Add a couple config options to compiler bisector (#148450 )	2025-03-04 23:23:21 +00:00
config.py	Revert "Enable TMA persistent GEMM Template by default (#149427 )"	2025-03-31 15:58:34 +00:00
constant_folding.py	skip torchbind in cosntant folding (#148993 )	2025-03-12 18:08:08 +00:00
cpp_builder.py	[inductor] Fix inductor windows linker error (#150256 )	2025-04-01 18:30:55 +00:00
cpu_vec_isa.py	Revert "Extend vec backend with BF16 SVE intrinsics (#143666 )"	2025-03-24 18:13:50 +00:00
cudagraph_trees.py	Cudagraph fix + comment cleanup (#149741 )	2025-03-21 21:12:36 +00:00
cudagraph_utils.py	[CUDAGraph] Graph Partition (#147648 )	2025-03-13 16:00:21 +00:00
custom_graph_pass.py
debug.py	Fix only logging ir_post_fusion with torch_compile_debug enabled (#148499 )	2025-03-05 05:35:09 +00:00
decomposition.py	Remove aten.elu core ATen decomp because it is now core ATen (#149780 )	2025-03-25 01:59:57 +00:00
dependencies.py	[Graph Partition] Support symbol inputs (#149458 )	2025-03-26 17:21:30 +00:00
dtype_propagation.py	[inductor] Add a helper for convert index_dtype to torch dtype (#149531 )	2025-03-20 21:33:29 +00:00
exc.py	PEP585 update - torch/_inductor/[_-i]* (#145137 )	2025-01-19 01:22:47 +00:00
extern_node_serializer.py	PEP585 update - torch/_inductor/[_-i]* (#145137 )	2025-01-19 01:22:47 +00:00
freezing_utils.py	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
freezing.py	Fix code cache + freezing compile-time regression (#145868 )	2025-01-31 02:04:15 +00:00
fuzzer.py	Don't look at TESTING_ONLY in fuzzer (#146870 )	2025-03-11 05:32:25 +00:00
fx_utils.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
graph.py	Move dump location to avoid dumping twice (#150219 )	2025-03-30 03:35:38 +00:00
hooks.py	PEP585 update - torch/_inductor/[_-i]* (#145137 )	2025-01-19 01:22:47 +00:00
index_propagation.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
inductor_prims.py	[inductor] support dilation in max_pool2d lowering (#148209 )	2025-03-24 13:00:12 +00:00
ir.py	Dont exclude constant_pad_nd in prologue fusion (#149947 )	2025-03-27 22:26:30 +00:00
jagged_lowerings.py	PEP585 update - torch/_inductor (#145198 )	2025-01-21 21:04:33 +00:00
loop_body.py	[inductor] online softmax (#127011 )	2025-03-06 21:07:18 +00:00
lowering.py	[inductor] No type promotion for slice_scatter (#150090 )	2025-03-28 17:02:01 +00:00
memory.py	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 )	2025-03-18 00:46:07 +00:00
metrics.py	[Inductor] Support parallel reduction for GroupNorm (#144020 )	2025-03-01 17:11:50 +00:00
mkldnn_ir.py	[Inductor][CPP] rename shim_mkldnn.h/.cpp to shim_cpu.h/.cpp (#149372 )	2025-03-21 03:42:12 +00:00
mkldnn_lowerings.py	[Inductor-CPP] If all of the activation scale dims are 1, make it a 0D tensor (#147033 )	2025-03-03 18:32:27 +00:00
mock_cache.py	PEP585 update - torch/_inductor (#145198 )	2025-01-21 21:04:33 +00:00
ops_handler.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
optimize_indexing.py	PEP585 update - torch/_inductor (#145198 )	2025-01-21 21:04:33 +00:00
output_code.py	Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054 )	2025-03-30 17:51:11 +00:00
pattern_matcher.py	[inductor]lowering scan to while_loop (#148580 )	2025-03-20 20:21:02 +00:00
quantized_lowerings.py	Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031 )	2025-03-18 01:33:13 +00:00
remote_cache.py	PEP585 update - torch/_inductor (#145198 )	2025-01-21 21:04:33 +00:00
scheduler.py	Ignore meta ops in inductor (#150137 )	2025-03-28 03:01:57 +00:00
script.ld
select_algorithm.py	[pytorch][triton] Warp specialization support in TritonTemplate for torchinductor (#148503 ) (#150122 )	2025-03-29 03:36:50 +00:00
sizevars.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
subgraph_lowering.py	[inductor] Refactor op handlers part 5 (#146257 )	2025-02-08 18:00:30 +00:00
template_heuristics.py	Reland "Introduce new template heuristic for triton autotune configs" (#147452 )	2025-03-26 15:47:06 +00:00
test_case.py	[Inductor] be able to disable cache for test (#141195 )	2025-01-24 19:15:55 +00:00
test_operators.py
triton_bundler.py	Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054 )	2025-03-30 17:51:11 +00:00
utils.py	Revert "Merge Triton ScaledMM as epilogue to MM template (#150045 )"	2025-04-01 17:54:28 +00:00
virtualized.py	[inductor] Add a helper for convert index_dtype to torch dtype (#149531 )	2025-03-20 21:33:29 +00:00
wrapper_benchmark.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00