Commit Graph

28 Commits

Author SHA1 Message Date
bobrenjc93
096cb874d3 remove allow-untyped-defs from torch/_prims/executor.py (#144233)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144233
Approved by: https://github.com/Skylion007
2025-01-07 19:40:40 +00:00
Xuehai Pan
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
Aaron Orenstein
afe15d2d2f Flip default value for mypy disallow_untyped_defs [3/11] (#127840)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127840
Approved by: https://github.com/oulgen
2024-06-08 18:28:01 +00:00
Ivan Yashchuk
c913f3857f Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 22:29:32 +00:00
PyTorch MergeBot
891bb259f8 Revert "Remove dynamo+nvfuser (#105789)"
This reverts commit 6030151d37.

Reverted https://github.com/pytorch/pytorch/pull/105789 on behalf of https://github.com/DanilBaibak due to Break a lot of tests on main. ([comment](https://github.com/pytorch/pytorch/pull/105789#issuecomment-1669710571))
2023-08-08 14:20:32 +00:00
Ivan Yashchuk
6030151d37 Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 13:29:31 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Justin Chu
8a688277a2 [BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432
Approved by: https://github.com/ezyang
2023-07-19 13:48:44 +00:00
Ivan Yashchuk
d802fcfcd8 Add config to PrimTorch's nvFuser executor (#84482)
This PR adds `executor_parameters` keyword argument to `torch._prims.executor.execute`.

For now there are two knobs:
* `use_python_fusion_cache: bool = True` whether to use lru_cache when constructing fusion object or not.
* `allow_single_op_fusion: bool = True` whether to allow fusions with single callable

Behavior can be controlled by passing dict with custom specified values as `executor_parameters` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84482
Approved by: https://github.com/jjsjann123, https://github.com/ngimel
2022-09-09 07:58:21 +00:00
Ivan Yashchuk
ea39146507 Add a common wrapper for make_fx to handle args and kwargs (#82965)
Added a helper function `wrapper_and_args_for_make_fx` since a number of places we use `make_fx` internally might grow.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82965
Approved by: https://github.com/ngimel
2022-08-08 17:03:21 +00:00
Ivan Yashchuk
ec67c6abbe Add torch.ops.nvprims namespace for nvFuser-specific prims (#82155)
New namespace `torch.ops.nvprims` is meant for specific to the nvFuser set of primitives. All `impl_nvfuser` attributes are removed from `torch.ops.prims` functions.

`NvfuserPrimsMode()` context manager can be used for automatic rewrite of `torch.ops.prims` calls to `torch.ops.nvprims` when possible.

The previous way to test whether a prim would be executable with nvFuser was to test `impl_nvfuser is not None`, now all functions in the `torch.ops.nvprims` namespace are supposed to have the `impl_nvfuser` attribute and hence all are executable by nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82155
Approved by: https://github.com/jjsjann123, https://github.com/ngimel
2022-08-04 16:51:56 +00:00
samdow
2ac24675cc get rid of push_torch_{dispatch, function}_mode (#78215)
Currently we have 2 ways of doing the same thing for torch dispatch and function modes:
`with push_torch_dispatch_mode(X)` or `with X.push(...)`
is now the equivalent of doing
`with X()`

This removes the first API (which is older and private so we don't need to go through a deprecation cycle)

There is some risk here that this might land race with a PR that uses the old API but in general it seems like most are using the `with X()` API or `enable_torch_dispatch_mode(X())` which isn't getting removed.

EDIT: left the `with X.push(...)` API since there were ~3 land races with that over the past day or so. But made it give a warning and ask users to use the other API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78215
Approved by: https://github.com/ezyang
2022-07-22 18:56:37 +00:00
Huy Do
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
Ivan Yashchuk
a3d5d2ddf1 Add partitioned nvFuser executor with ATen fallbacks (#81043)
This PR introduces a new nvFuser executor for FX graphs containing different kinds of nodes, not just `torch.ops.prims` supported by nvFuser. The FX graph is partitioned based on whether nodes are supported or not by nvFuser and supported nodes are fused into subgraphs, that's all using Sherlock's work on the partitioner.

This new partitions-based executor with fallbacks to ATen is used by default with `executor="nvfuser"`. And the previous executor can be used with `executor="strictly_nvfuser"`, naming suggestions are welcome!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81043
Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad
2022-07-20 19:51:20 +00:00
Ivan Yashchuk
9a12aa6cad Add cached nvFuser's fusion creation for torch._prims.executor (#80525)
In the current setup for each call of the `execute` function, a `Fusion` object was constructed using `GraphModule` and args, that's expensive.

This PR makes use of `functools.lru_cache` to pay the `Fusion` creation cost once per `GraphModule` and set of args. Currently, the shape, strides, and dtype of tensors are static it can be changed later to make better use of the nvFuser's internal caching mechanism (by specifying only ndim, contiguity, dtype).

On master:
```py
In [2]: a = torch.randn(3, 3, device='cuda')

In [3]: with TorchRefsMode.push():
   ...:     gm = make_fx(lambda x: torch.sigmoid(x))(a)
   ...:

In [4]: %%timeit
   ...: execute(gm, a, executor="nvfuser")
   ...: torch.cuda.synchronize()
175 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
This PR:
```py
In [2]: a = torch.randn(3, 3, device='cuda')

In [3]: with TorchRefsMode.push():
   ...:     gm = make_fx(lambda x: torch.sigmoid(x))(a)
   ...:

In [4]: %%timeit
   ...: execute(gm, a, executor="nvfuser")
   ...: torch.cuda.synchronize()
62.6 µs ± 9.99 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```

In addition, this PR adds support for pytree inputs and extends the test for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80525
Approved by: https://github.com/kevinstephano, https://github.com/jjsjann123, https://github.com/SherlockNoMad
2022-07-05 17:00:45 +00:00
Natalia Gimelshein
9244547a1b small cleanup of executor (#79973)
per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79973
Approved by: https://github.com/mruberry
2022-06-22 00:35:51 +00:00
Natalia Gimelshein
c0ce4b0de9 make refs executor handle kwargs (#79858)
Mostly fixes #78923
I had to disable function patching in fx for functions with kwonly args, see https://github.com/pytorch/pytorch/compare/ngimel/make_fx_fix?expand=1#diff-090b22122be0779cd14afd2ebaf20d1e7c0bfe837e9eefa1d84e7521bb1defc6R446, cc @jamesr66a
But it looks like it was doing weird things anyway - it was patching signature of wrapped function with arbitrary local vars from wrapper, that can't be right, but I don't know what the intent there is.
A lot of functions now fail with nvfuser executor, and some still fail with aten, although with the different errors than before.
Edit: undid the change to _symbolic_script.py, turns out inspect.unwrapping function is not needed, and fx never sees kwargs.
cc @IvanYashchuk, @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79858
Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-06-21 18:53:15 +00:00
Ivan Yashchuk
8895862744 Enable torch._refs.mean for nvFuser executor (#79444)
This PR fixes a bug with `broadcast_in_dim` leading to the situation when reduction ops were not allowed to be used before `broadcast_in_dim`.

With this PR it's possible to run
```py
import torch
import torch._refs
from torch._prims.executor import make_traced

def foo(a):
    return torch._refs.mean(a, keepdim=False)

a = torch.randn(3, 3, device='cuda')
make_traced(foo)(a, executor="nvfuser")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79444
Approved by: https://github.com/mruberry, https://github.com/jjsjann123
2022-06-14 19:42:07 +00:00
Edward Z. Yang
587efdb5fa Replace TensorMeta with FakeTensor
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78836

Approved by: https://github.com/albanD, https://github.com/mruberry
2022-06-05 11:51:27 +00:00
Ivan Yashchuk
df748b60f7 Allow pytrees as output for make_traced and nvfuser executor (#78802)
This PR lifts the restriction that the output of a function traced with `make_traced` and executed with nvFuser must be a single tensor. Now it's possible to return a "pytree", a tensor's nested data structure (see https://github.com/pytorch/pytorch/blob/master/torch/utils/_pytree.py).

I added a test with a function that returns a tuple of two objects where one of the objects is a dictionary with a tensor value.

```py
def fn(a, b):
    d = {}
    d["c"] = torch.add(a, b)
    return (d, torch.add(a, d["c"]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78802
Approved by: https://github.com/mruberry
2022-06-04 08:41:18 +00:00
Edward Z. Yang
feaa64f7e0 Apply black lint CI to PrimTorch
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77226

Approved by: https://github.com/mruberry
2022-05-11 18:23:12 +00:00
PyTorch MergeBot
bbb1a55d9f Revert "Apply black lint CI to PrimTorch"
This reverts commit 188854eeaf.

Reverted https://github.com/pytorch/pytorch/pull/77226 on behalf of https://github.com/ezyang
2022-05-11 17:55:29 +00:00
Edward Z. Yang
188854eeaf Apply black lint CI to PrimTorch
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77226

Approved by: https://github.com/mruberry
2022-05-11 16:37:19 +00:00
Kevin Stephano
752d496c91 Fix broadcast_in_dim support in NVFuser Frontend (#76790)
This PR primarily addresses augmenting the frontend to properly support `broadcast_in_dim`.  This required make a new version of the `define_tensor()` that takes in the `size` and `strides` of input tensors in order to properly determine broadcasts.

This PR also has a fix for the `python_example.py` that broke when a new argument was added to reductions to allow the user to specify an output Data Type.

`define_tensor()` Interface Example:

```
fusion2 = Fusion()

input1 = torch.ones(1, 1, 4, device='cuda')
input2 = torch.ones(2, 3, 4, device='cuda')

with FusionDefinition(fusion2) as fd :
    t0 = fd.define_tensor(sizes=input1.size(), strides=input1.stride())
    t1 = fd.define_tensor(sizes=input2.size(), strides=input2.stride())

    fd.add_input(t0)
    fd.add_input(t1)

    t0_b = fd.Ops.broadcast_in_dim(t0, [2, 3, 4], [0, 1, 2])
    print("Broadcast TensorView", t0_b)
    t2 = fd.Ops.add(t0_b, t1)

    fd.add_output(t2)
```
Print statement of defined broadcast tensor:

```
Broadcast TensorView T2_l[ sbS6{1}, sbS7{1}, iS8{i2} ] DataType: float Contiguity: ttt
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76790
Approved by: https://github.com/mruberry, https://github.com/jjsjann123
2022-05-10 18:13:22 +00:00
Edward Z. Yang
48eb8d6aad Use TorchFunctionMode to implement PrimTorch tracing context
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76735

Approved by: https://github.com/mruberry
2022-05-04 23:49:46 +00:00
Mike Ruberry
f6bbecf8b5 Adds python ref consistency test, elementwise unary reference inputs, and formats test files
Per title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76626
Approved by: https://github.com/ngimel
2022-05-01 22:42:46 +00:00
Mike Ruberry
fe1968dea0 [primTorch] Prototype nvFuser integration and test_prims.py
This adds prototype nvFuser integration for the following prims:

- broadcast_in_dim
- convert_element_type
- add
- div
- ge
- gt
- le
- lt
- mul

Adding it for additional prims supported by nvFuser's prototype Python frontend should be easy.

This also adds a new sugar to run operations using the ATen or nvFuser trace executors. For example:

```
def foo(a, b):
  return torch.add(a, b)

traced_foo = make_traced(foo)

a = torch.randn((1, 2, 3, 4, 5), device='cuda')
b = torch.randn((1, 2, 3, 4, 5), device='cuda')
result = traced_foo(a, b, executor='nvfuser')
```

Currently only operations with tensor inputs and one tensor output are supported, and the operation must be composed exclusively of reference or prim operations.

Finally, this adds a new test, test_prims.py, that just tests the broadcast_in_dim prim for now. In the future we'll likely have OpInfos for each prim, but we'll need a reference implementation of broadcast_in_dim to make that interesting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76560
Approved by: https://github.com/ngimel
2022-04-29 02:02:25 +00:00
Mike Ruberry
4048d4cdd2 [primTorch] Prototype tracer and elementwise unary reference opinfo class
Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388
Approved by: https://github.com/ngimel
2022-04-27 14:40:21 +00:00