pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Ivan Yashchuk	ff2569bc8c	Intercept aten._reshape_alias for nvFuser (#87072 ) This would help forming larger fusion groups. If this won't end up executed by nvFuser then eager mode implementation would call into `.reshape`: `37e9e89afb/torch/_prims/nvfuser_prims.py (L552-L553)` cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87072 Approved by: https://github.com/ngimel	2022-10-25 21:53:12 +00:00
PyTorch MergeBot	5308886ec3	Revert "Intercept aten._reshape_alias for nvFuser (#87072 )" This reverts commit `163a829caa`. Reverted https://github.com/pytorch/pytorch/pull/87072 on behalf of https://github.com/malfet due to Looks like it broke test_indexing in dynamo shard, see https://github.com/pytorch/pytorch/actions/runs/3318778609/jobs/5483248042	2022-10-25 14:45:14 +00:00
Ivan Yashchuk	163a829caa	Intercept aten._reshape_alias for nvFuser (#87072 ) This would help forming larger fusion groups. If this won't end up executed by nvFuser then eager mode implementation would call into `.reshape`: `37e9e89afb/torch/_prims/nvfuser_prims.py (L552-L553)` cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87072 Approved by: https://github.com/ngimel	2022-10-25 06:56:02 +00:00
Nikita Karetnikov	841995d53b	[primTorch] Add refs for data conversion ops (#86561 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86561 Approved by: https://github.com/lezcano, https://github.com/mruberry, https://github.com/zou3519	2022-10-18 08:38:51 +00:00
Ivan Yashchuk	31931515bc	Workarounds for cudnn_batch_norm with TorchRefsNvfuserCapabilityMode (#86796 ) This PR adds workarounds to support AOT Autograd's graphs containing `aten.cudnn_batch_norm` and `aten.cudnn_batch_norm_backward` with `TorchRefsNvfuserCapabilityMode`. The problem with the decomposition of `aten.cudnn_batch_norm` is that it uses a `new_empty` call that is not supported by nvFuser and we are conservative with lowering functions to nvprims by default. The problem with the decomposition of `aten.cudnn_batch_norm_backward` is described here https://github.com/pytorch/pytorch/pull/86115#issue-1394883782, but changing the decomposition directly in that PR makes many tests fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86796 Approved by: https://github.com/mruberry	2022-10-17 18:46:28 +00:00
Ivan Yashchuk	fd80684784	Add nvFuser support for torch.Tensor.view (#84634 ) This is an alternative to https://github.com/pytorch/pytorch/pull/83739. While PrimTorch has `view` as a reference, we would like to use nvFuser's implementation for `view` for now. Later we might transition to PrimTorch's `torch._refs.view`. See `test_nvprims_view` for examples of things that are now sent to nvFuser. Note that nvFuser's `view` is a copy-like operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84634 Approved by: https://github.com/kevinstephano, https://github.com/mruberry	2022-10-14 12:08:02 +00:00
Kevin Stephano	b14f1d7bb8	Add Skip List for Aten Ops that are fused in nvFuser. (#86101 ) This Skip List (tuple) is added under the nvprims context manager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86101 Approved by: https://github.com/jjsjann123, https://github.com/mruberry	2022-10-07 03:55:13 +00:00
Ivan Yashchuk	68a6113248	Add nvFuser support for torch.native_batch_norm (#85562 ) This PR adds nvFuser's implementation for batch_norm as there's no reference yet (https://github.com/pytorch/pytorch/pull/81191) and no in-place copy support (https://github.com/pytorch/pytorch/pull/84545). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel	2022-10-03 15:03:08 +00:00
Ivan Yashchuk	b00a5359f7	Add a way to skip lowering to nvprims (#85811 ) This PR adds `skip_ops` argument to `TorchRefsNvfuserCapabilityMode` and `NvfuserPrimsMode` which is an iterable of function names to be skipped in the translation to nvprims process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85811 Approved by: https://github.com/mruberry, https://github.com/jjsjann123	2022-09-30 12:01:45 +00:00
jjsjann123	cab6ffa0f7	catches failure on nvprim speculative lowering (#85580 ) Fixes #85517 Added a try/catch exception during tracing `get_isolated_graphmodule` inside `_is_func_unsupported_nvfuser`. Stops speculative lowering to nvprim when query errors out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85580 Approved by: https://github.com/mruberry, https://github.com/IvanYashchuk	2022-09-29 15:22:45 +00:00
samdow	18d8c548f4	[Modes] remove enable and rewrite mode stack (squashed) (#84774 ) Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch\|function} This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup ### Background Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like ```python ## PRE-PR UX def f(mode): with mode.restore(): # user needs to understand this restore thing? ... with Mode() as m: pass f(m) ``` Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation" step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write ```python ## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR def f(mode): with mode: ... f(Mode()) ``` Technical Details With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774 Approved by: https://github.com/ezyang, https://github.com/zou3519	2022-09-27 01:04:35 +00:00
Kevin Stephano	c7b17d7eb1	Add nvprims `rand_like` support for Dropout (#85077 ) NM Pull Request resolved: https://github.com/pytorch/pytorch/pull/85077 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-09-23 18:03:35 +00:00
Ivan Yashchuk	35943f30cb	Reference implementation for torch.Tensor.sum_to_size (#85338 ) New ref: `torch._refs.sum_to_size`. View consistency validation is disabled because the ref returns a view instead of returning the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85338 Approved by: https://github.com/mruberry	2022-09-21 18:12:52 +00:00
Ivan Yashchuk	3aae6ff1e1	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-28 18:45:25 +00:00
PyTorch MergeBot	b159a5230f	Revert "Add nvprims.var_mean (#83508 )" This reverts commit `7e7694b661`. Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 11:30:27 +00:00
jjsjann123	b078d242c4	Nvfuser to copy decomp to prim (#83782 ) Conditional decomposing aten::_to_copy to nvprim::convert_element_type to allow fusion with type casting, which is introduced during type promotion phase at torch decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83782 Approved by: https://github.com/ngimel	2022-08-28 04:26:36 +00:00
Ivan Yashchuk	7e7694b661	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-27 09:05:20 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Ivan Yashchuk	9f03444f70	Add torch.ops.aten -> torch._refs mapping to TorchRefsMode using decomposition_table (#82657 ) ### Description This PR adds the possibility to convert `torch.ops.aten` calls to `torch._refs` and consequently prims under TorchRefsMode. ### Testing New test, `test_aten_overload_to_prims`, in `test/test_prims.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82657 Approved by: https://github.com/jjsjann123, https://github.com/ezyang	2022-08-17 14:46:06 +00:00
Fabio Rocha	2a096e940d	[primTorch] support for a few magic methods (#83524 ) Added support for mapping __rsub__, __rtruediv__, __rfloordiv__, __floordiv__, __pow__, and __rpow__ in TorchRefsMode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83524 Approved by: https://github.com/ngimel	2022-08-17 09:48:15 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Ivan Yashchuk	ec67c6abbe	Add torch.ops.nvprims namespace for nvFuser-specific prims (#82155 ) New namespace `torch.ops.nvprims` is meant for specific to the nvFuser set of primitives. All `impl_nvfuser` attributes are removed from `torch.ops.prims` functions. `NvfuserPrimsMode()` context manager can be used for automatic rewrite of `torch.ops.prims` calls to `torch.ops.nvprims` when possible. The previous way to test whether a prim would be executable with nvFuser was to test `impl_nvfuser is not None`, now all functions in the `torch.ops.nvprims` namespace are supposed to have the `impl_nvfuser` attribute and hence all are executable by nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82155 Approved by: https://github.com/jjsjann123, https://github.com/ngimel	2022-08-04 16:51:56 +00:00
Ivan Yashchuk	900e93d351	Add context manager for conditional rewrites of torch.* to torch._refs.* calls (#81764 ) Adds a new context manager `TorchRefsNvfuserCapabilityMode` for conditional rewrite of `torch.` calls to `torch._refs.` based on whether the decomposition consisting of prims supports nvFuser execution or not. A new optional argument for `TorchRefsMode` is added - `should_fallback_fn`, a callable that returns whether the original `torch.foo` or the replacement `torch._refs.foo` should be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81764 Approved by: https://github.com/ezyang	2022-08-02 11:02:10 +00:00
Edward Z. Yang	98b9dfa129	Add decompositions for zero_, fill_, new_full, new_zeros, new_ones (#82332 ) This makes symbolic tracing tests for logsigmoid and xlogy start working again. While I'm at it, add pin_memory and layout kwargs to empty; but they don't actually do anything and raise an error if they are non standard. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82332 Approved by: https://github.com/eellison	2022-07-28 04:02:02 +00:00
lezcano	11fe277b62	[PrimTorch] Add reference for torch.norm (#81765 ) This ref does more things than `torch.norm`, and it fixes a few bugs that `torch.norm` has. This implementation and the `torch.norm` implementation come to terms in the next PR of this stack We put this PR before, as otherwise `test_decomp.py` was failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81765 Approved by: https://github.com/ngimel	2022-07-25 19:57:21 +00:00
Huy Do	12cb26509a	Apply ufmt to torch internal (#81643 ) This is a big bang PR, merge conflicts are probably expected and will be addressed at merge. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643 Approved by: https://github.com/ezyang	2022-07-22 02:19:50 +00:00
Horace He	a5fb41e3d3	Revert "Revert "Refactored prim utils into _prims_utils folder (#81746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81746 Approved by: https://github.com/anijain2305, https://github.com/Krovatkin	2022-07-20 23:43:57 +00:00
PyTorch MergeBot	e43a02c314	Revert "Refactored prim utils into _prims_utils folder (#81088 )" This reverts commit `80231d0a72`. Reverted https://github.com/pytorch/pytorch/pull/81088 on behalf of https://github.com/jeanschmidt due to breaking internal tests	2022-07-19 19:56:41 +00:00
Horace He	80231d0a72	Refactored prim utils into _prims_utils folder (#81088 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81088 Approved by: https://github.com/ngimel	2022-07-19 03:55:51 +00:00
Peter Bell	bf36d8b987	[primTorch] Implement one-dimensional fft transforms (#80570 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80570 Approved by: https://github.com/mruberry	2022-07-15 15:13:43 +00:00
Ivan Yashchuk	0d5bc54114	Fix interpretation torch -> torch._refs in case of nested torch calls under TorchRefsMode (#80135 ) torch calls inside `TorchRefsMode.__torch_function__` dispatch should be interpreted as refs calls under `TorchRefsMode`. Fixes https://github.com/pytorch/pytorch/issues/80079. In addition, this PR enables two more tests for the nvFuser executor. For example here's the FX trace of `torch._refs.nn.functional.layer_norm` before the proposed change (note the mix of `aten` and `prims`): ```py opcode name target args kwargs ------------- ---------------------- -------------------------- -------------------------------- ----------------- placeholder a_1 a_1 () {} call_function convert_element_type prims.convert_element_type (a_1, torch.float32) {} call_function var prims.var (convert_element_type, [0, 1]) {'correction': 0} call_function broadcast_in_dim prims.broadcast_in_dim (var, [1, 1], []) {} call_function convert_element_type_1 prims.convert_element_type (a_1, torch.float32) {} call_function sum_1 prims.sum (convert_element_type_1, [0, 1]) {} call_function broadcast_in_dim_1 prims.broadcast_in_dim (sum_1, [1, 1], []) {} call_function div prims.div (broadcast_in_dim_1, 9.0) {} call_function add aten.add (broadcast_in_dim, 1e-05) {} call_function rsqrt aten.rsqrt (add,) {} call_function sub aten.sub (a_1, div) {} call_function mul aten.mul (sub, rsqrt) {} call_function convert_element_type_2 prims.convert_element_type (mul, torch.float32) {} output output output (convert_element_type_2,) {} ``` And with this PR: ```py opcode name target args kwargs ------------- ---------------------- -------------------------- -------------------------------- ----------------- placeholder a_1 a_1 () {} call_function convert_element_type prims.convert_element_type (a_1, torch.float32) {} call_function var prims.var (convert_element_type, [0, 1]) {'correction': 0} call_function broadcast_in_dim prims.broadcast_in_dim (var, [1, 1], []) {} call_function convert_element_type_1 prims.convert_element_type (a_1, torch.float32) {} call_function sum_1 prims.sum (convert_element_type_1, [0, 1]) {} call_function broadcast_in_dim_1 prims.broadcast_in_dim (sum_1, [1, 1], []) {} call_function div prims.div (broadcast_in_dim_1, 9.0) {} call_function add prims.add (broadcast_in_dim, 1e-05) {} call_function rsqrt prims.rsqrt (add,) {} call_function broadcast_in_dim_2 prims.broadcast_in_dim (div, [3, 3], [0, 1]) {} call_function sub prims.sub (a_1, broadcast_in_dim_2) {} call_function broadcast_in_dim_3 prims.broadcast_in_dim (rsqrt, [3, 3], [0, 1]) {} call_function mul prims.mul (sub, broadcast_in_dim_3) {} call_function convert_element_type_2 prims.convert_element_type (mul, torch.float32) {} output output output (convert_element_type_2,) {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80135 Approved by: https://github.com/ngimel	2022-06-25 03:55:04 +00:00
Horace He	e89676f76c	fix logical_not reland issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/79900 Approved by: https://github.com/ngimel	2022-06-21 03:41:18 +00:00
Nikita Shulga	f5eb05f107	Revert "Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads"" This reverts commit `f3665dd237`. Reverted https://github.com/pytorch/pytorch/pull/79819 on behalf of https://github.com/malfet due to land raced with softshrink refs	2022-06-20 14:22:15 -07:00
Horace He	f3665dd237	Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads" Pull Request resolved: https://github.com/pytorch/pytorch/pull/79819 Approved by: https://github.com/mruberry	2022-06-20 19:50:43 +00:00
PyTorch MergeBot	fefff54cad	Revert "Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads""" This reverts commit `a2d2981e8e`. Reverted https://github.com/pytorch/pytorch/pull/79224 on behalf of https://github.com/suo due to broke lots of things `a2d2981e8e`	2022-06-10 04:40:43 +00:00
Horace He	a2d2981e8e	Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"" This reverts commit `d67309aefb`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79224 Approved by: https://github.com/mruberry	2022-06-10 03:07:14 +00:00
PyTorch MergeBot	d67309aefb	Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads" This reverts commit `64b6bd8c1e`. Reverted https://github.com/pytorch/pytorch/pull/79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000	2022-06-09 13:11:23 +00:00
Horace He	64b6bd8c1e	Added {logical_not, trace} refs, moved logical ops to use method overloads Pull Request resolved: https://github.com/pytorch/pytorch/pull/79000 Approved by: https://github.com/ezyang	2022-06-09 07:16:36 +00:00
Horace He	e675dbadc4	Ported gelu decomp to ref (#78697 ) Ugh... these are actually so painful to write without operator overloading lol. Decided to just utilize operator overloading, and xfail the ref tests for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78697 Approved by: https://github.com/mruberry	2022-06-06 22:30:20 +00:00
Edward Z. Yang	587efdb5fa	Replace TensorMeta with FakeTensor Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78836 Approved by: https://github.com/albanD, https://github.com/mruberry	2022-06-05 11:51:27 +00:00
Ivan Yashchuk	df748b60f7	Allow pytrees as output for make_traced and nvfuser executor (#78802 ) This PR lifts the restriction that the output of a function traced with `make_traced` and executed with nvFuser must be a single tensor. Now it's possible to return a "pytree", a tensor's nested data structure (see https://github.com/pytorch/pytorch/blob/master/torch/utils/_pytree.py). I added a test with a function that returns a tuple of two objects where one of the objects is a dictionary with a tensor value. ```py def fn(a, b): d = {} d["c"] = torch.add(a, b) return (d, torch.add(a, d["c"])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/78802 Approved by: https://github.com/mruberry	2022-06-04 08:41:18 +00:00
Edward Z. Yang	6b273444c4	Add logit ref; allow non-refs to be called in refs. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77816 Approved by: https://github.com/mruberry	2022-05-21 02:35:14 +00:00
Edward Z. Yang	4a11678368	Change TensorMeta to use tensor subclass. This means it can be fed through traditional PyTorch C++ code (although currently it does not work, as the __torch_dispatch__ implementation is stubbed to always throw an error.) Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76759 Approved by: https://github.com/mruberry	2022-05-04 23:49:47 +00:00
Edward Z. Yang	48eb8d6aad	Use TorchFunctionMode to implement PrimTorch tracing context Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76735 Approved by: https://github.com/mruberry	2022-05-04 23:49:46 +00:00
Mike Ruberry	fe1968dea0	[primTorch] Prototype nvFuser integration and test_prims.py This adds prototype nvFuser integration for the following prims: - broadcast_in_dim - convert_element_type - add - div - ge - gt - le - lt - mul Adding it for additional prims supported by nvFuser's prototype Python frontend should be easy. This also adds a new sugar to run operations using the ATen or nvFuser trace executors. For example: ``` def foo(a, b): return torch.add(a, b) traced_foo = make_traced(foo) a = torch.randn((1, 2, 3, 4, 5), device='cuda') b = torch.randn((1, 2, 3, 4, 5), device='cuda') result = traced_foo(a, b, executor='nvfuser') ``` Currently only operations with tensor inputs and one tensor output are supported, and the operation must be composed exclusively of reference or prim operations. Finally, this adds a new test, test_prims.py, that just tests the broadcast_in_dim prim for now. In the future we'll likely have OpInfos for each prim, but we'll need a reference implementation of broadcast_in_dim to make that interesting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76560 Approved by: https://github.com/ngimel	2022-04-29 02:02:25 +00:00
Mike Ruberry	4048d4cdd2	[primTorch] Prototype tracer and elementwise unary reference opinfo class Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388 Approved by: https://github.com/ngimel	2022-04-27 14:40:21 +00:00

46 Commits