This PR allows user to author a CUDA kernel in python.
```
from torch.cuda.jiterator import create_jit_fn
code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)
a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```
Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)
TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67942
- Change "name" to "code" for consistency with linttool and LintMessage
format.
- Change "args" and "init_args" to "command" and "init_command" for
consistency with internal representation.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D32250606
Pulled By: suo
fbshipit-source-id: 557fef731bab9adca7ab1e7cc41b996956076b05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67894
As title. Confirmed that the code base passes by running:
```
lintrunner --paths-cmd='git grep -Il ""' --take NEWLINE
```
and seeing that it pases
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D32250604
Pulled By: suo
fbshipit-source-id: de9bcba635d21f8832bb25147b19b7b2e8802247