Commit Graph

237 Commits

Author SHA1 Message Date
Scott Wolchok
2b323e61ad [PyTorch] AOTI: Use static_cast, not dynamic_cast (#112798)
dynamic_cast is for when we aren't certain about the type. We are certain (and will crash anyway if we're wrong).

Differential Revision: [D50812978](https://our.internmc.facebook.com/intern/diff/D50812978/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112798
Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/jansel, https://github.com/khabinov
ghstack dependencies: #112116, #112174, #112405
2023-12-12 06:19:45 +00:00
Scott Wolchok
ca52195112 [PyTorch] AOTI: Avoid aoti_torch_data_ptr calls for constants at inference time (#112405)
Cache aoti_torch_get_data_ptr at constants update time.

Differential Revision: [D50708982](https://our.internmc.facebook.com/intern/diff/D50708982/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112405
Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/khabinov
ghstack dependencies: #112116, #112174
2023-12-12 06:19:45 +00:00
Scott Wolchok
24c67fe8cf [PyTorch] AOTI: Emit static constexpr int array vars when possible (#112174)
No need to populate a stack-based array for a shape/stride array when it's statically known.

Differential Revision: [D50699889](https://our.internmc.facebook.com/intern/diff/D50699889/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112174
Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/jansel
ghstack dependencies: #112116
2023-12-12 06:19:45 +00:00
Scott Wolchok
ff6f987adc [PyTorch] Replace cached thread_locals with stack allocation in AOTI (#112116)
This changes cached thread_local tensors to stack-allocated buffers. Since we were incidentally caching output in a thread_local, I had to add manual thread_local caching of outputs, which I implemented by caching a buffer and a Tensor whose storage is that buffer and then just memcpying the result into the cached buffer every time. Ideally, memory planning would be able to identify allocations that are the backing storage for outputs, but this should be good enough in the absence of planning.

Differential Revision: [D50416438](https://our.internmc.facebook.com/intern/diff/D50416438/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112116
Approved by: https://github.com/jansel, https://github.com/desertfire
2023-12-12 06:19:45 +00:00
Bin Bao
2e6b809d6b [AOTI] Fix a missing declaration for the result of item() (#115175)
Differential Revision: [D51968539](https://our.internmc.facebook.com/intern/diff/D51968539)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115175
Approved by: https://github.com/chenyang78
2023-12-10 22:49:45 +00:00
Mu-Chu Lee
80527c0cf2 [AOTInductor] Double buffering for Weights (#114446)
Summary:
This adds function to model container doing weight swapping with double buffering.

There are 2 parts for double buffering
a) Write constants into inactive buffer
b) Swap active buffer

For (a), we write the constants into the buffer that's currently not in use, and store the information in both constants map and the corresponding constant array to read.
For (b), we obtain the lock, and activate the constant map/constant array that is inactive, and flag the one that's currently in use to inactive.

Test Plan:
test/cpp/aot_inductor/test.cpp

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51543732](https://our.internmc.facebook.com/intern/diff/D51543732)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114446
Approved by: https://github.com/chenyang78, https://github.com/eellison
2023-12-05 22:31:56 +00:00
Yang Chen
4d8b9964e1 [aotinductor] support at::convolution for AOTInductor (#114961)
This PR adds support to at::convolution for AOTInductor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114961
Approved by: https://github.com/desertfire
2023-12-03 07:52:28 +00:00
Bin Bao
8a90249bc2 [inductor] Update triton pin (#114772)
Differential Revision: [D51761353](https://our.internmc.facebook.com/intern/diff/D51761353)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114772
Approved by: https://github.com/shunting314, https://github.com/atalman
2023-12-02 19:13:56 +00:00
chilli
1f51f977ae misc visualization/utility improvements (#114984)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114984
Approved by: https://github.com/weifengpy
ghstack dependencies: #114520
2023-12-02 04:02:39 +00:00
Jez Ng
f1fd02503b Reland #113487 and #112527 (sdpa shim & fp8 AOTInductor support) (#114974)
This is a backout of #113747 which reverted the above two commits. Now that
#113997 has landed, this diff can be landed safely without breaking ABI compatibility.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114974
Approved by: https://github.com/chenyang78
2023-12-02 03:25:51 +00:00
Mu-Chu Lee
a9aad4ea21 [AOTInductor] Generate Triton header even if scheduler is not invoked. (#114972)
Summary:
Generate Triton header for profiling.
If Triton header isn't generated through Scheduler, generate it directly
when in wrapper codegen.

Test Plan:
Test included in commit.
(test_aot_inductor.py:test_with_no_triton_profiler)

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114972
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-12-02 02:03:38 +00:00
chunyuan
e3c42d3fb3 Inductor cpp wrapper: fix buffer free in non-AOT mode (#114741)
We found performance regression when using cpp wrapper in non-AOT mode due to the change in https://github.com/pytorch/pytorch/pull/110892.
https://github.com/pytorch/pytorch/pull/110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114741
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-11-30 16:46:55 +00:00
colinpeppler
5262484ece [easy][aotinductor] fix typos & add static typing (#114728)
```
// check all references
$ grep -rl 'cpp_kernel_overlad_name' *
ir.py
```

```
$ lintrunner --take MYPYINDUCTOR torch/_inductor/codegen/wrapper.py torch/_inductor/ir.py
ok No lint issues.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114728
Approved by: https://github.com/Skylion007, https://github.com/chenyang78
2023-11-30 02:10:56 +00:00
Jack Taylor
4a4c9fb0b8 [ROCm] Add ROCm AMDGPU support for inductor cpp codegen (#105141)
Follows from previous enablement attempt: https://github.com/pytorch/pytorch/pull/101797

Adds support for hsaco binaries in inductor's cpp_wrapper codegen and enables the CUDA tests in test_cpp_wrapper.

This PR also brings in additional required hipify mappings for the wrapper codegen file.

NOTE: we can unskip some of these tests when we enabled MI210 runners.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105141
Approved by: https://github.com/jansel, https://github.com/malfet
2023-11-29 15:11:24 +00:00
Scott Wolchok
5b9add666f [PyTorch] AOTI: Emit CACHED_TORCH_TYPE only as needed (#113997)
Avoids potential compatibility issues where a new dtype is supported by the DSO but not the binary loading it.

Differential Revision: [D51434335](https://our.internmc.facebook.com/intern/diff/D51434335/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113997
Approved by: https://github.com/int3
2023-11-29 03:12:32 +00:00
Adnan Akhundov
0a063ad2c0 [inductor] Pass None and skip constexpr in custom Triton kernel calls from C++ (#114475)
Summary: `None` arguments are codegened as `*i8` in the `triton_meta` of the generated or user-defined Triton kernels:

85aa372374/torch/_inductor/codegen/triton_utils.py (L33-L36)

Due to this, in contrary to the conventional Triton, we actually should pass `nullptr` to the Triton kernels in C++ wrapper codegen instead of passing nothing (as normally `None` doesn't make it to the generated PTX parameters, just like `tl.constexpr` args).

This PR adds two things:

1. Proper C++ wrapper codegening (ABI and non-ABI) of `nullptr` and `c10::nullopt`, as the prior way codegening `c10::nullopt` as tensor breaks (also `c10` breaks in the ABI mode).

2. Skipping `tl.constexpr` args when calling the loaded-from-cubin compiled Triton kernel in the C++ wrapper codegen. As a side effect, this also resolves an issue with string arguments: now they are simply omitted in the C++ wrapper codegen.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_triton_kernel_with_none_input
...
----------------------------------------------------------------------
Ran 4 tests in 40.364s

OK (skipped=2)
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114475
Approved by: https://github.com/oulgen
2023-11-24 12:51:56 +00:00
Yang Chen
ebeaec71bf [aotinductor] don't generate python profiling code in the cpp world (#114182)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114182
Approved by: https://github.com/aakhundov, https://github.com/desertfire
2023-11-21 21:11:58 +00:00
Oguz Ulgen
ef90508f75 [AOTI] Support ReinterpretView in abi mode (#114169)
https://github.com/pytorch/pytorch/pull/113967 added support for
ReinterpretView but it turnes out we codegen it differently in abi
compat mode. This PR adds support for abi compat mode as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114169
Approved by: https://github.com/aakhundov
2023-11-21 17:08:00 +00:00
Jez Ng
87925789ae Make V.graph properly typed (#114025)
Previously it lacked a type hint and so was treated as an Any type. This
resulted in a lot of untyped code downstream as V.graph is referenced in
many places in inductor code. I've typed it properly now as
GraphLowering, and fixed the numerous type errors this surfaced.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114025
Approved by: https://github.com/eellison
ghstack dependencies: #114013
2023-11-21 02:14:29 +00:00
Adnan Akhundov
ae00d9623e [inductor] Add ABI shim function for torch.scatter (#114027)
Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
s...
----------------------------------------------------------------------
Ran 4 tests in 52.713s

OK (skipped=1)
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114027
Approved by: https://github.com/chenyang78, https://github.com/desertfire
ghstack dependencies: #114024
2023-11-20 22:51:59 +00:00
Oguz Ulgen
e0c3936843 [Inductor] Support ReinterpretView in inductor codegen (#113967)
Adding support for ReinterpretView in inductor so that jagged MRS kernels can use native triton kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113967
Approved by: https://github.com/aakhundov
2023-11-18 18:19:32 +00:00
Bin Bao
1480c670a0 [AOTI] Delay the fallback kernel naming decision to the codegen time (#113660)
Summary: This is to prepare for a later change that changes AOTI's second-pass to perform codegen only.

Differential Revision: [D51382677](https://our.internmc.facebook.com/intern/diff/D51382677)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113660
Approved by: https://github.com/chenyang78
2023-11-16 23:07:30 +00:00
Wei Wei
b19cf868e8 Back out "Support fp8 in AOTInductor + support optional<> in C ABI (#112527)" (#113747)
Test Plan: sandcastle

Differential Revision: D51330618

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113747
Approved by: https://github.com/chenyang78, https://github.com/khabinov
2023-11-15 22:42:22 +00:00
Yang Chen
a144eb502a [aotinductor] add versions for the sdpa shim api (#113487)
In our first implemenation of the sdpa shim api, we didn't consider
the case where the optional scale argument could be None. It was
unnoticed because we always got a default argument for the cuda backend.
The issue was detected with the cpu backend.

This PR implements versioning for shim kernels. Currently, we only
have different versions for the sdpa api. We expect we would only
maintain a very small number of abi-compatible shim APIs that
had different versions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113487
Approved by: https://github.com/int3, https://github.com/desertfire
2023-11-13 20:18:58 +00:00
Oguz Ulgen
6ea20f5dc5 [AOTI] Use expr_printer to print sympy expr (#113317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113317
Approved by: https://github.com/aakhundov, https://github.com/chenyang78
2023-11-13 20:14:04 +00:00
Jez Ng
7afb503e3c [inductor] Label align() with [[maybe_unused]] (#113502)
This squelches the "defined but not used" warning that occurs when
memory planning is disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113502
Approved by: https://github.com/jansel
2023-11-12 16:33:47 +00:00
Jez Ng
5e03af8295 [inductor] Enable floor_div indexing to work under ABI-compat mode (#113276)
Previously, floor_div operations were defined in
ATen/native/BinaryOps.h. Since this header was not included under
ABI-compat mode, trying to use those indexing operations would result in
compilation errors.

Technically, it is safe to use aten::native::floor_div_* functions in
ABI-compat mode as they are header-only; we could simply include
BinaryOps.h. However, there are other declarations in BinaryOps.h that
are not binary-compatible, so this is not ideal. Thus, I have moved those
functions into a separate file, and put them under c10/util, since they
don't really have tensor-specific logic.

c10 functions are not all header-only, so this still isn't ideal, but
this still seems like an improvement. Moreover, cpp_prefix.h -- used
when compiling cpp kernels -- already includes c10 header files, so
ABI-compatibility already depends on maintaining some c10 functions as
header-only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113276
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-11-11 02:51:29 +00:00
Oguz Ulgen
06dc2f162d [AOTI] Implement support for user defined kernels that use triton.autotune (#113229)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113229
Approved by: https://github.com/chenyang78
2023-11-10 22:40:51 +00:00
Jez Ng
a2c32b8bd0 [inductor] Make codegen/{common,wrapper,cuda/cutlass_utils}.py pass follow_imports typechecking (#113411)
SymIntType is referenced by wrapper.py, so I added its .pyi definition.
I also added SymBoolType along the way for completeness.

The `insinstance` checks in wrapper.py reference torch.Type, which seems
to cause mypy to choke. Not entirely sure why; I've just added
type-ignore comments for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113411
Approved by: https://github.com/Skylion007
ghstack dependencies: #113409, #113410
2023-11-10 19:58:08 +00:00
PyTorch MergeBot
2cd8c0565c Revert "[AOTI] Implement support for user defined kernels that use triton.autotune (#113229)"
This reverts commit 1488bafb27.

Reverted https://github.com/pytorch/pytorch/pull/113229 on behalf of https://github.com/PaliC due to breaking test_aot_inductor.py tests though a forward fix is coming ([comment](https://github.com/pytorch/pytorch/pull/113229#issuecomment-1806159396))
2023-11-10 17:46:14 +00:00
Oguz Ulgen
1488bafb27 [AOTI] Implement support for user defined kernels that use triton.autotune (#113229)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113229
Approved by: https://github.com/chenyang78
2023-11-10 01:39:00 +00:00
Jez Ng
297c26bb8e Support fp8 in AOTInductor + support optional<> in C ABI (#112527)
This was originally ipiszy's PR: https://github.com/pytorch/pytorch/pull/112358

It turns out that we need to add support for optional types in order to
support fp8 gemm (i.e. scaled_mm). Since our ABI-stable C interface
can't support optional<> directly, I am passing in optional types via
pointer instead.

`AtenTensorHandle`s are already pointers, so nothing needs to change
there. Only value types need to change.

We decided on this approach instead of adding an extra `bool` param to
the callee because this simplifies things. Having the same number of
arguments regardless of whether we are emitting Python / C++ /
ABI-compatible C++ makes codegen easier.

There are a number of existing ABI-compatible functions that have
optional-typed value parameters. Previously, they just assumed they
would never be passed a `nullopt` / `None` at runtime. Changing them to
use pointer types now would break ABI stability, so I have created an
exclude list for those functions.

Finally, I think the current implementation is kind of messy, and only
works for FallbackKernels, even though technically ExternKernels could
also have the same issue. It also doesn't support optional types nested
in lists. I've left FIXME comments for both issues.

Differential Revision: [D51084289](https://our.internmc.facebook.com/intern/diff/D51084289)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112527
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-11-08 22:56:48 +00:00
Oguz Ulgen
8ba11bf79d [AOTI] Support non auto-tuned triton kernels in aoti (#113090)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113090
Approved by: https://github.com/aakhundov, https://github.com/chenyang78, https://github.com/desertfire
2023-11-08 07:48:15 +00:00
Oguz Ulgen
dbf44dffc9 [Inductor] Cache generated user defined triton kernels on tensor dtype and non tensor parameters (#112752)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112752
Approved by: https://github.com/jansel
2023-11-07 05:29:16 +00:00
Oguz Ulgen
13d62e28a3 [Inductor] Add Dynamic shape support to user defined triton kernels (#112523)
1) This PR moves the grid function codegen to wrapper so that we can use
   IndentBuffers as opposed to manually adding tabs for indentation.
2) In inductor, emits the grid function in the body of the kernel call so
   that it can use free symbols from dynamic shapes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112523
Approved by: https://github.com/Chillee
2023-11-02 23:58:50 +00:00
Jez Ng
ae85ba820f [inductor] Memory planning (#112178)
This was originally @jansel's PR:
https://github.com/pytorch/pytorch/pull/102625, which I've built upon.

This diff implements static memory planning. It's disabled by default
while we examine its performance.

We use a greedy-by-size approach. For dynamic shapes, the sizes of the
example inputs are used as estimates when making planning decisions. We
generate expressions to calculate the actual memory offsets and sizes at
runtime when the values of the dynamic shapes are known. In order to
simplify these calculations, we have organized the allocations into a
tree that branches on space (address offsets) and time (live ranges).
Finally, we need to align these offsets, so we have added an `align`
sympy Expr to express these calculations.

Some limitations:

1. It is only enabled during inference for now. Enabling it for training
   increases peak memory usage as we allocate all the memory needed for
   training upfront, before freeing the memory allocated during
   inference. We can probably address this by doing planning for both
   the inference and training passes together.
2. It doesn't work with PyTorch Distributed, because kernels like
   AllGatherIntoTensor codegen strings which do memory operations. We
   can fix this down the line by having them emit MemoryPlanningLines
   instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178
Approved by: https://github.com/desertfire, https://github.com/jansel
2023-11-02 07:39:13 +00:00
David Berard
8191fb3e06 [Reland2] [inductor][BE] split triton_meta and inductor_meta (#112351)
triton_meta is intended to be passed directly to triton. Previous we were also putting other metadata into triton_meta; but we should split out the other metadata into a separate dict to avoid possible conficts in the future.

This PR splits out triton_meta and inductor_meta so we have a place to put additional metadata that isn't intended to be passed to triton.

Tests - wait for CI

Differential Revision: [D50864493](https://our.internmc.facebook.com/intern/diff/D50864493)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112351
Approved by: https://github.com/eellison
2023-11-02 00:40:12 +00:00
PyTorch MergeBot
74e6c877e9 Revert "[inductor] Memory planning (#112178)"
This reverts commit f64a97c6f8.

Reverted https://github.com/pytorch/pytorch/pull/112178 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems that ROCm will need to be fixed for the new test too f64a97c6f8 ([comment](https://github.com/pytorch/pytorch/pull/112178#issuecomment-1788195311))
2023-11-01 00:03:56 +00:00
Jez Ng
f64a97c6f8 [inductor] Memory planning (#112178)
This was originally @jansel's PR:
https://github.com/pytorch/pytorch/pull/102625, which I've built upon.

This diff implements static memory planning. It's disabled by default
while we examine its performance.

We use a greedy-by-size approach. For dynamic shapes, the sizes of the
example inputs are used as estimates when making planning decisions. We
generate expressions to calculate the actual memory offsets and sizes at
runtime when the values of the dynamic shapes are known. In order to
simplify these calculations, we have organized the allocations into a
tree that branches on space (address offsets) and time (live ranges).
Finally, we need to align these offsets, so we have added an `align`
sympy Expr to express these calculations.

Some limitations:

1. It is only enabled during inference for now. Enabling it for training
   increases peak memory usage as we allocate all the memory needed for
   training upfront, before freeing the memory allocated during
   inference. We can probably address this by doing planning for both
   the inference and training passes together.
2. It doesn't work with PyTorch Distributed, because kernels like
   AllGatherIntoTensor codegen strings which do memory operations. We
   can fix this down the line by having them emit MemoryPlanningLines
   instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178
Approved by: https://github.com/desertfire, https://github.com/jansel
2023-10-31 20:02:30 +00:00
Yang Chen
94f3df27e4 [aotinductor] reland: return a copy of any constant (#112370)
When the model returns a constant, we cannot "release" its handle,
because the constant doesn't have any handle at all. Instead,
we should allocate a new tensor and then return a copy of the constant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112370
Approved by: https://github.com/hl475, https://github.com/desertfire
2023-10-31 18:36:44 +00:00
chunyuan
f50ec341bc inductor cpp wrapper: add GIL release and acquire (#111888)
Support multiple instances inference (in different threads of the same process) as in https://github.com/pytorch/pytorch/issues/93524#issuecomment-1421816158.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111888
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
2023-10-31 03:23:30 +00:00
Oguz Ulgen
1250032c2e [Inductor] Add triton.autotune support for user defined triton kernels with complex grids (#112290)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112290
Approved by: https://github.com/jansel
2023-10-30 17:48:27 +00:00
Oguz Ulgen
c14c4efc0e [Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112228
Approved by: https://github.com/jansel
2023-10-28 17:30:35 +00:00
PyTorch MergeBot
8d44999183 Revert "[Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228)"
This reverts commit dbb31a2984.

Reverted https://github.com/pytorch/pytorch/pull/112228 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm test in trunk dbb31a2984 ([comment](https://github.com/pytorch/pytorch/pull/112228#issuecomment-1783660326))
2023-10-28 01:51:32 +00:00
Oguz Ulgen
dbb31a2984 [Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112228
Approved by: https://github.com/jansel
2023-10-27 21:40:22 +00:00
Bin Bao
f66cc67562 [aotinductor] Fix duplicated unbacked symbol declarations (#111823)
Summary: For https://github.com/pytorch/pytorch/issues/111711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111823
Approved by: https://github.com/ezyang, https://github.com/aakhundov
2023-10-26 21:11:08 +00:00
angelayi
b126adcdee [aotinductor] Pass TorchIR to AOTInductor (#110020)
Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code.
Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020
Approved by: https://github.com/desertfire
2023-10-26 15:54:31 +00:00
Oguz Ulgen
a29a844938 [Inductor] Support top level constants in user defined triton kernels (#111970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111970
Approved by: https://github.com/jansel
ghstack dependencies: #111956
2023-10-25 02:43:51 +00:00
Oguz Ulgen
bb550b25c9 [Inductor] Support user defined triton kernels calling other triton kernels and activation functions (#111956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111956
Approved by: https://github.com/jansel
2023-10-25 02:39:43 +00:00
Oguz Ulgen
ddcf9c050b [Inductor] Support calling user defined kernels with different type of arguments (#111939)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111939
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #111770, #111808
2023-10-24 19:49:48 +00:00