pytorch/torch/_inductor
Nikita Shulga dee016ceb7 [MPSInductor] Add store_reduce method (#150457)
That restrict the store operation to 0th thread, which should be much better, shouldn't it
(Though I don't observe it in the benchmark)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150457
Approved by: https://github.com/jansel, https://github.com/dcci
ghstack dependencies: #150452
2025-04-02 05:12:49 +00:00
..
autoheuristic PEP585 update - torch/_inductor/[_-i]* (#145137) 2025-01-19 01:22:47 +00:00
codegen [MPSInductor] Add store_reduce method (#150457) 2025-04-02 05:12:49 +00:00
compile_worker Improve subproc autotuning implementation (#149700) 2025-03-28 01:06:39 +00:00
fx_passes [Inductor] Hide reinplace_fsdp_all_gather pass behind skip_fsdp_hooks config (#150436) 2025-04-01 22:56:06 +00:00
kernel [AMD] [TRITON] [INDUCTOR] Add tl.assume to enable bufferops on AMD (#150373) 2025-04-01 23:29:39 +00:00
package [AOTI] Add num_runners to AOTIModelPackageLoader (#149364) 2025-03-19 02:28:06 +00:00
runtime Unify on dynamo_compile as the overall wait counter (#150293) 2025-04-01 08:55:51 +00:00
__autotune_main__.py Improve subproc autotuning implementation (#149700) 2025-03-28 01:06:39 +00:00
__init__.py Fix for AOTI + CUDAGraphs when calling from Python (#148601) 2025-03-08 02:44:14 +00:00
analyze_preserves_zero_mask.py Dont exclude constant_pad_nd in prologue fusion (#149947) 2025-03-27 22:26:30 +00:00
aoti_eager.py PEP585 update - torch/_inductor/[_-i]* (#145137) 2025-01-19 01:22:47 +00:00
async_compile.py Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054) 2025-03-30 17:51:11 +00:00
autotune_process.py [pytorch][triton] Warp specialization support in TritonTemplate for torchinductor (#148503) (#150122) 2025-03-29 03:36:50 +00:00
bounds.py [inductor] Refactor op handlers part 5 (#146257) 2025-02-08 18:00:30 +00:00
choices.py Reland "Introduce new template heuristic for triton autotune configs" (#147452) 2025-03-26 15:47:06 +00:00
codecache.py Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054) 2025-03-30 17:51:11 +00:00
comm_analysis.py
comm_lowering.py Fix an issue where functional collectives don't force fx stride on inputs when compiled (#146467) 2025-02-10 19:15:49 +00:00
comms.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
compile_fx_async.py Use correct boxed_forward_device_index when running CompiledFxGraph.post_compile (#148130) 2025-03-23 02:57:58 +00:00
compile_fx_ext.py async fx compile (#146135) 2025-03-19 14:07:51 +00:00
compile_fx_subproc.py async fx compile (#146135) 2025-03-19 14:07:51 +00:00
compile_fx.py Unify on dynamo_compile as the overall wait counter (#150293) 2025-04-01 08:55:51 +00:00
compiler_bisector.py Add a couple config options to compiler bisector (#148450) 2025-03-04 23:23:21 +00:00
config.py Revert "Enable TMA persistent GEMM Template by default (#149427)" 2025-03-31 15:58:34 +00:00
constant_folding.py skip torchbind in cosntant folding (#148993) 2025-03-12 18:08:08 +00:00
cpp_builder.py [inductor] Fix inductor windows linker error (#150256) 2025-04-01 18:30:55 +00:00
cpu_vec_isa.py Revert "Extend vec backend with BF16 SVE intrinsics (#143666)" 2025-03-24 18:13:50 +00:00
cudagraph_trees.py Cudagraph fix + comment cleanup (#149741) 2025-03-21 21:12:36 +00:00
cudagraph_utils.py [CUDAGraph] Graph Partition (#147648) 2025-03-13 16:00:21 +00:00
custom_graph_pass.py
debug.py Fix only logging ir_post_fusion with torch_compile_debug enabled (#148499) 2025-03-05 05:35:09 +00:00
decomposition.py Remove aten.elu core ATen decomp because it is now core ATen (#149780) 2025-03-25 01:59:57 +00:00
dependencies.py [Graph Partition] Support symbol inputs (#149458) 2025-03-26 17:21:30 +00:00
dtype_propagation.py [inductor] Add a helper for convert index_dtype to torch dtype (#149531) 2025-03-20 21:33:29 +00:00
exc.py PEP585 update - torch/_inductor/[_-i]* (#145137) 2025-01-19 01:22:47 +00:00
extern_node_serializer.py PEP585 update - torch/_inductor/[_-i]* (#145137) 2025-01-19 01:22:47 +00:00
freezing_utils.py PEP585: More UP006 fixes (#146392) 2025-02-20 06:18:13 +00:00
freezing.py Fix code cache + freezing compile-time regression (#145868) 2025-01-31 02:04:15 +00:00
fuzzer.py Don't look at TESTING_ONLY in fuzzer (#146870) 2025-03-11 05:32:25 +00:00
fx_utils.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
graph.py Move dump location to avoid dumping twice (#150219) 2025-03-30 03:35:38 +00:00
hooks.py PEP585 update - torch/_inductor/[_-i]* (#145137) 2025-01-19 01:22:47 +00:00
index_propagation.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
inductor_prims.py [inductor] support dilation in max_pool2d lowering (#148209) 2025-03-24 13:00:12 +00:00
ir.py Dont exclude constant_pad_nd in prologue fusion (#149947) 2025-03-27 22:26:30 +00:00
jagged_lowerings.py PEP585 update - torch/_inductor (#145198) 2025-01-21 21:04:33 +00:00
loop_body.py [inductor] online softmax (#127011) 2025-03-06 21:07:18 +00:00
lowering.py [inductor] No type promotion for slice_scatter (#150090) 2025-03-28 17:02:01 +00:00
memory.py [BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257) 2025-03-18 00:46:07 +00:00
metrics.py [Inductor] Support parallel reduction for GroupNorm (#144020) 2025-03-01 17:11:50 +00:00
mkldnn_ir.py [Inductor][CPP] rename shim_mkldnn.h/.cpp to shim_cpu.h/.cpp (#149372) 2025-03-21 03:42:12 +00:00
mkldnn_lowerings.py [Inductor-CPP] If all of the activation scale dims are 1, make it a 0D tensor (#147033) 2025-03-03 18:32:27 +00:00
mock_cache.py PEP585 update - torch/_inductor (#145198) 2025-01-21 21:04:33 +00:00
ops_handler.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
optimize_indexing.py PEP585 update - torch/_inductor (#145198) 2025-01-21 21:04:33 +00:00
output_code.py Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054) 2025-03-30 17:51:11 +00:00
pattern_matcher.py [inductor]lowering scan to while_loop (#148580) 2025-03-20 20:21:02 +00:00
quantized_lowerings.py Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031) 2025-03-18 01:33:13 +00:00
remote_cache.py PEP585 update - torch/_inductor (#145198) 2025-01-21 21:04:33 +00:00
scheduler.py Ignore meta ops in inductor (#150137) 2025-03-28 03:01:57 +00:00
script.ld
select_algorithm.py [pytorch][triton] Warp specialization support in TritonTemplate for torchinductor (#148503) (#150122) 2025-03-29 03:36:50 +00:00
sizevars.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
subgraph_lowering.py [inductor] Refactor op handlers part 5 (#146257) 2025-02-08 18:00:30 +00:00
template_heuristics.py Reland "Introduce new template heuristic for triton autotune configs" (#147452) 2025-03-26 15:47:06 +00:00
test_case.py [Inductor] be able to disable cache for test (#141195) 2025-01-24 19:15:55 +00:00
test_operators.py
triton_bundler.py Store statically launchable CachingAutotuners inside CompiledFXGraph.triton_bundle (#149054) 2025-03-30 17:51:11 +00:00
utils.py Revert "Merge Triton ScaledMM as epilogue to MM template (#150045)" 2025-04-01 17:54:28 +00:00
virtualized.py [inductor] Add a helper for convert index_dtype to torch dtype (#149531) 2025-03-20 21:33:29 +00:00
wrapper_benchmark.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00