pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Oguz Ulgen	47b5a6b05d	[Dynamo] Analyze triton kernels via tracing to determine mutations (#117300 ) This PR adds TTIR lexing and parsing in order to analyze which of the user defined triton kernel inputs are mutated. Differential Revision: [D53165999](https://our.internmc.facebook.com/intern/diff/D53165999) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117300 Approved by: https://github.com/jansel	2024-01-29 06:37:08 +00:00
Edward Z. Yang	9bce208dfb	Replace follow_imports = silent with normal (#118414 ) This is a lot of files changed! Don't panic! Here's how it works: * Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file. * When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded. * The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors. * Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list. * Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves. * torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state. * There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many. In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file. The codemod was done with this script authored by GPT-4: ``` import glob exclude_patterns = [ ... ] for pattern in exclude_patterns: for filepath in glob.glob(pattern, recursive=True): if filepath.endswith('.py'): with open(filepath, 'r+') as f: content = f.read() f.seek(0, 0) f.write('# mypy: ignore-errors\n\n' + content) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414 Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD	2024-01-27 02:44:11 +00:00
Alexander Grund	865945cc1f	Convert `requires_cuda` to full decorator (#118281 ) Don't require using it as `@requires_cuda()` -> `@requires_cuda` instead No need for the partial function invoked many times Split out this change from the initial large refactoring in #117741 to hopefully get merged before conflicts arise Pull Request resolved: https://github.com/pytorch/pytorch/pull/118281 Approved by: https://github.com/ezyang	2024-01-25 15:50:21 +00:00
Adnan Akhundov	247f9c3de4	Preserve strides of custom Triton kernel args (#116219 ) Summary: Currently, we [`clone`](`19207b9183/torch/_inductor/lowering.py (L5273)`) every `TensorBox` argument of custom Triton kernels while lowering them to the Inductor IR, during which the stride information of the kernel inputs is lost. This is problematic in the common case when the strides of a `torch.Tensor` argument are passed as scalars to a custom Triton kernel alongside the tensor itself (due to the underlying Triton code interpreting the tensors as raw pointers, so the contained stride semantics of the `torch.Tensor` is lost). In this PR, we add an extended version of the existing [`clone` lowering](`19207b9183/torch/_inductor/lowering.py (L2289)`)---`clone_preserve_reinterpret_view`---which carries over the `ir.ReinterpretVew` layers (if any) from the source `TensorBox` to the cloned one. The rationale behind adding a new function (and switching to it in the `triton_kernel_wrap` only for now) as opposed to extending the existing `clone` is keeping the semantics of the latter untouched, as it is a lowering of `torch.clone` (albeit incomplete, as the `memory_format` is currently ignored). Changing the existing `clone` would change the semantics which is not necessarily desirable in general. Open to suggestions, though. Test Plan: ``` $ python test/dynamo/test_functions.py -k test_triton_kernel_strided_input ... ---------------------------------------------------------------------- Ran 1 test in 5.568s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116219 Approved by: https://github.com/jansel	2023-12-21 22:46:32 +00:00
Oguz Ulgen	c55210b4f0	[Inductor] Deduplicate grid wrapper statements for user defined triton kernels (#115849 ) Noticed that on many MRS kernels the grid wrapper for autotuning is huge with a bunch of duplicates due to num_warps and num_stages not being needed for grid calculation. Lets deduplicate these entries. Previously, we would see wrapper like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` now it looks like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115849 Approved by: https://github.com/jansel	2023-12-20 00:25:32 +00:00
PyTorch MergeBot	c539f7df10	Revert "[Inductor] Deduplicate grid wrapper statements for user defined triton kernels (#115849 )" This reverts commit `21b8127f1c`. Reverted https://github.com/pytorch/pytorch/pull/115849 on behalf of https://github.com/jeanschmidt due to Breaking internal tests, please check internal diff for more details ([comment](https://github.com/pytorch/pytorch/pull/115849#issuecomment-1863012933))	2023-12-19 15:47:55 +00:00
Oguz Ulgen	21b8127f1c	[Inductor] Deduplicate grid wrapper statements for user defined triton kernels (#115849 ) Noticed that on many MRS kernels the grid wrapper for autotuning is huge with a bunch of duplicates due to num_warps and num_stages not being needed for grid calculation. Lets deduplicate these entries. Previously, we would see wrapper like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` now it looks like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115849 Approved by: https://github.com/jansel	2023-12-14 23:26:04 +00:00
Adnan Akhundov	0a063ad2c0	[inductor] Pass None and skip constexpr in custom Triton kernel calls from C++ (#114475 ) Summary: `None` arguments are codegened as `*i8` in the `triton_meta` of the generated or user-defined Triton kernels: `85aa372374/torch/_inductor/codegen/triton_utils.py (L33-L36)` Due to this, in contrary to the conventional Triton, we actually should pass `nullptr` to the Triton kernels in C++ wrapper codegen instead of passing nothing (as normally `None` doesn't make it to the generated PTX parameters, just like `tl.constexpr` args). This PR adds two things: 1. Proper C++ wrapper codegening (ABI and non-ABI) of `nullptr` and `c10::nullopt`, as the prior way codegening `c10::nullopt` as tensor breaks (also `c10` breaks in the ABI mode). 2. Skipping `tl.constexpr` args when calling the loaded-from-cubin compiled Triton kernel in the C++ wrapper codegen. As a side effect, this also resolves an issue with string arguments: now they are simply omitted in the C++ wrapper codegen. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_triton_kernel_with_none_input ... ---------------------------------------------------------------------- Ran 4 tests in 40.364s OK (skipped=2) ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114475 Approved by: https://github.com/oulgen	2023-11-24 12:51:56 +00:00
PyTorch MergeBot	1e60174891	Revert "[dynamo] Add run_inductor_tests entrypoint (#113278 )" This reverts commit `b00311ce9e`. Reverted https://github.com/pytorch/pytorch/pull/113278 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113278#issuecomment-1811646325))	2023-11-15 01:19:48 +00:00
Jason Ansel	b00311ce9e	[dynamo] Add run_inductor_tests entrypoint (#113278 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113278 Approved by: https://github.com/yanboliang	2023-11-11 08:54:43 +00:00
Oguz Ulgen	f6008be266	Move all triton related testing utils into shared file (#113008 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113008 Approved by: https://github.com/zou3519, https://github.com/jansel ghstack dependencies: #112752	2023-11-07 05:29:29 +00:00

11 Commits