pytorch/torch/_inductor/codegen
Colin Peppler f272e0ab4a [inductor] support unbacked symint divisors in vars_and_sizes (#130595)
Scenario:
```
>>> nodes
IterationRangesEntry(
    x2,
    divisor=192*u0 + 192576,
    length=s1,
    (xindex//(192*u0 + 192576)),
    {x0: 192, x1: u0 + 1003, x2: s1, x3: 192*s1*u0 + 192576*s1, x4: 192*u0 + 192576})
IterationRangesEntry(
    x1,
    divisor=192,
    length=u0 + 1003,
    ModularIndexing(xindex, 192, u0 + 1003),
    {x0: 192, x1: u0 + 1003, x2: s1, x3: 192*s1*u0 + 192576*s1, x4: 192*u0 + 192576})
IterationRangesEntry(
    x0,
    divisor=1,
    length=192,
    ModularIndexing(xindex, 1, 192),
    {x0: 192, x1: u0 + 1003, x2: s1, x3: 192*s1*u0 + 192576*s1, x4: 192*u0 + 192576})
```

Think about whether using fallback is safe here. I think it's safe because the divisor of one IterationRangesEntry should be the product of the lengths of the preceding IterationRangesEntry? Unless, one of the lengths divides by an unbacked symint?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130595
Approved by: https://github.com/aakhundov, https://github.com/ezyang
2024-07-16 16:21:38 +00:00
..
aoti_runtime [Inductor] Enable ABI-compatible mode for cpp-wrapper JIT (#121309) 2024-03-07 14:22:06 +00:00
cuda [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
rocm [Inductor][ROCm] Composable Kernel backend for Inductor (#125453) 2024-06-25 20:54:14 +00:00
xpu Flip default value for mypy disallow_untyped_defs [2/11] (#127839) 2024-06-08 18:23:08 +00:00
__init__.py
aoti_hipify_utils.py Fix hipify regular expression for AOTI wrapper (#128912) 2024-06-21 18:00:40 +00:00
codegen_device_driver.py Add warpSize to Device properties (#128449) 2024-07-01 09:13:32 +00:00
common.py [Inductor] FlexAttention supports partial masking (#130415) (#130626) 2024-07-14 00:37:26 +00:00
cpp_gemm_template.py [Inductor][CPP] Remove redundant INT8-specific logic in the INT8 GEMM template (#129470) 2024-07-02 13:15:15 +00:00
cpp_micro_gemm.py [inductor] [cpp] use non-temporal tile load for A (#129455) 2024-07-15 04:07:29 +00:00
cpp_prefix.h [inductor][cpp] BF16 AMX micro-gemm support (#127195) 2024-06-21 07:21:47 +00:00
cpp_template_kernel.py [Inductor][CPP] Support more than one LocalBuffer (#129121) 2024-07-14 11:31:14 +00:00
cpp_template.py [inductor][cpp] refactor CppTemplateKernel to inherit CppKernel (#129101) 2024-06-22 12:50:37 +00:00
cpp_utils.py [Inductor][CPP] Support more than one LocalBuffer (#129121) 2024-07-14 11:31:14 +00:00
cpp_wrapper_cpu.py [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199) 2024-07-11 17:30:28 +00:00
cpp_wrapper_cuda.py [AOTI] Properly indent launchKernel calls in AOTInductor (#129616) 2024-06-28 19:16:18 +00:00
cpp.py [inductor][cpp] align dtype convert cache between vec and scalar kernels (#130677) 2024-07-16 13:25:05 +00:00
cuda_combined_scheduling.py [Inductor][ROCm] Composable Kernel backend for Inductor (#125453) 2024-06-25 20:54:14 +00:00
halide.py [halide-backend] Random number generation (#130211) 2024-07-15 05:03:24 +00:00
memory_planning.py Flip default value for mypy disallow_untyped_defs [2/11] (#127839) 2024-06-08 18:23:08 +00:00
multi_kernel.py [halide-backend] Add GPU support (#127506) 2024-06-29 14:06:21 +00:00
simd.py [inductor] support unbacked symint divisors in vars_and_sizes (#130595) 2024-07-16 16:21:38 +00:00
triton_foreach.py [inductor][refactor] Unify the use of generate_kernel_call (#128467) 2024-06-19 07:47:25 +00:00
triton_split_scan.py Flip default value for mypy disallow_untyped_defs [2/11] (#127839) 2024-06-08 18:23:08 +00:00
triton_utils.py Flip default value for mypy disallow_untyped_defs [2/11] (#127839) 2024-06-08 18:23:08 +00:00
triton.py Add support for inline_asm_elementwise in Inductor lowerings (#129846) 2024-07-03 02:34:03 +00:00
wrapper.py [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206) 2024-07-14 08:17:52 +00:00