Commit Graph

195 Commits

Author SHA1 Message Date
Michael Voznesensky
fe3ecfe0cf Add AotAutogradFallbackTests to dynamic suite (#100454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454
Approved by: https://github.com/ezyang
2023-05-04 04:28:45 +00:00
PyTorch MergeBot
c3aa59c8f5 Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196)"
This reverts commit 32615618e4.

Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Justin Chu
e779a30d50 [BE] Fix SIM109 compare-with-tuple (#100337)
Use {replacement} instead of multiple equality comparisons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100337
Approved by: https://github.com/Skylion007
2023-04-30 19:51:32 +00:00
Tugsbayasgalan Manlaibaatar
d4bf76c2a4 Persist torch.assert in aten graph (#100101)
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101
Approved by: https://github.com/jansel
2023-04-28 07:31:43 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Xiaodong Wang
cc01568efd [pt2] Register meta func to randperm.default (#99593)
Summary:
Looks we're missing the meta func for randperm.default. I get complaints like this when I compile randperm with dynamic shape which I think is because it gets into the real implementation but not the meta func.

```
RuntimeError: expected int but got s0
Exception raised from expect_int at fbcode/caffe2/c10/core/SymInt.h:128 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 5  at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const
# 6  at::_ops::randperm::redispatch(c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 7  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 8  c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)

```

Differential Revision: D45137851

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99593
Approved by: https://github.com/ezyang
2023-04-25 08:55:43 +00:00
Wanchao Liang
ca24a96216 minor fix to fused adam meta registration (#99436)
This PR fixes the registration by adding `max_exp_avg_sqs` to the
output shape list too, and fix some type check issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99436
Approved by: https://github.com/mrshenli
2023-04-24 22:50:02 +00:00
Edward Z. Yang
10c938abef Handle meta['val'] for tuple of lists. (#99724)
Fixes https://github.com/pytorch/pytorch/issues/99356

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724
Approved by: https://github.com/wanchaol
2023-04-21 22:33:21 +00:00
Rodrigo Kumpera
38e964056b Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-18 15:15:46 +00:00
PyTorch MergeBot
1c042a2137 Revert "Reland python ops (#99170)"
This reverts commit d4de64ae8d.

Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 11:37:43 +00:00
Rodrigo Kumpera
d4de64ae8d Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-17 21:53:41 +00:00
Nikita Karetnikov
106ccf4a2a [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 21:21:45 +00:00
PyTorch MergeBot
f957334c2b Revert "[pt2] add meta function for linalg.cross (#99279)"
This reverts commit efc3887ea5.

Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird
2023-04-17 19:33:16 +00:00
Nikita Karetnikov
efc3887ea5 [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 03:05:20 +00:00
Rodrigo Kumpera
a910045add [PATCH] Back out "Move functional collectives implementation to python. (#98595) (#99168)
Summary:
Original commit changeset: ba36f8751adc

Original Phabricator Diff: D44788697

Test Plan: model loading is fine after reverting the diff

Reviewed By: zyan0, sayitmemory

Differential Revision: D44921259
---

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168
Approved by: https://github.com/izaitsevfb
2023-04-14 23:48:19 +00:00
XiaobingSuper
9c98f2ceb7 inductor: rewrite mkldnn fx fusion using pattern_matcher(binary) (#97141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97141
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 06:23:03 +00:00
XiaobingSuper
c214c50355 inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_unary) (#97007)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97007
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 05:52:54 +00:00
Guang Yang
c377a8590b Add nonzero_static() op to pytorch to unblock export (#97417)
Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR

Example:

Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])`

call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`
call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)
call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)

Test Plan:
**Unit Tests**
```
buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'
```

**PT2 Export with `nonzero_static()`**
Example of `GraphModule` in the exported graph
```
def forward(self, x):
    arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
    nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4);  arg0 = None
    return pytree.tree_unflatten([nonzero_static_default], self._out_spec)
```

Differential Revision: D44324808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417
Approved by: https://github.com/ezyang
2023-04-11 05:13:36 +00:00
Nikita Karetnikov
b411238d76 [pt2] add meta function for logcumsumexp (#98683)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683
Approved by: https://github.com/ezyang
2023-04-09 01:26:37 +00:00
Rodrigo Kumpera
24d9001527 Move functional collectives implementation to python. (#98595)
This simplifies a lot the work we need to add new ops.

This relands the previous PR, not sure why it was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595
Approved by: https://github.com/wconstab
2023-04-07 21:48:05 +00:00
Nikita Karetnikov
1c226f5aad [pt2] add meta functions for cummax and cummin (#98552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552
Approved by: https://github.com/Chillee
2023-04-07 17:58:28 +00:00
albanD
0210481dcb Fix _like meta registrations (#98160)
The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!).
zeros_like is special due to sparse and is fixed directly by always filling it with zeros.
Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions.
I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal.

test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160
Approved by: https://github.com/ezyang
2023-04-06 18:44:34 +00:00
PyTorch MergeBot
67d1a77086 Revert "Move functional collectives implementation to python. (#98315)"
This reverts commit 8b0374f83c.

Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace
2023-04-06 16:49:40 +00:00
Nikita Karetnikov
7b25976323 [pt2] add meta function for take (#98451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451
Approved by: https://github.com/ezyang
2023-04-06 14:48:35 +00:00
Rodrigo Kumpera
8b0374f83c Move functional collectives implementation to python. (#98315)
This simplifies a lot the work we need to add new ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315
Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze
2023-04-06 14:06:16 +00:00
PyTorch MergeBot
fa08e546f3 Revert "Add all_reduce_coalesced functional collective (#97157)"
This reverts commit a3fc3531f5.

Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
Rodrigo Kumpera
a3fc3531f5 Add all_reduce_coalesced functional collective (#97157)
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.

Might not work if any outputs is unused.

Test code:

```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os

import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
    spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner

def my_fun(a, b):
    c = a * 3
    tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
    return ((tensors[1] + tensors[0] + tensors[2]).sum(), )

@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):

    x = torch.arange(4).cuda() * (dist.get_rank() + 1)
    y = torch.arange(4).cuda() * (dist.get_rank() + 1)
    x = x.to(torch.float)
    y = y.to(torch.float) * 0.5
    res = make_fx(my_fun)(x, y)
    print(f"fx graph:\n{res.graph}")
    ind = compile_fx_inner(res, [x, y])
    print(f"inductor done:\n{ind}")

os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True

if __name__ == "__main__":
    inductor_main(None)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Shen Li
e8d39606eb [SPMD] Enable fused Adam in full train step tracing (#98113)
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
bccf2ef0ce Format DTensor dispatch.py and _meta_registrations.py (#98114)
Format-only changes with black and lintrunner to prepare for the commit on top.

Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
9ec6fdb29b Enable adam foreach in full train step tracing (#97897)
Main changes:

1. Registered several foreach ops to both meta and DTensor
2. Skip redundant getitem node when expanding foreach ops with DTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-03-30 16:47:10 +00:00
Shen Li
379fb47654 [SPMD] Support foreach optimizers with functionalization (#97853)
My first attempt was to apply the same solution as how proxy_tensor.py
handles other inplace ops. However, foreach is different in the way
that it's schema is `native_functions.yaml` does not return anything,
whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks
bdhirsh for teaching me this!). As a result, the proxy output
during tracing does not wrap anything, and hence we cannot correctly
connect it with subsequent operators. Modifying `native_functions.yaml`
is not a preferred solution. After discussing with bdhirsh, the
temporary solution is to do foreach functionalization as a graph
pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852
is addressed, we will switch to default functionalization.

Edit: the latest version follows @bdhirsh 's suggestion on using
`make_fx` `decomposition_table` instead of implementing manual
fx.Graph tranforms to functionalize `_foreach_add_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-03-30 11:27:10 +00:00
Christian Puhrsch
9d37cefcb0 Resubmit _int_mm (#96685)
Avoids any changes to gemm_and_bias

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685
Approved by: https://github.com/drisspg, https://github.com/ngimel
2023-03-27 16:14:07 +00:00
Shen Li
a8f7e0b213 [Easy] Improve error message for meta_mm (#97533)
Differential Revision: [D44376381](https://our.internmc.facebook.com/intern/diff/D44376381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97533
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2023-03-25 01:09:41 +00:00
Driss Guessous
98a5cf090d [SDPA] Remove the chunk_grad from mem-eff attention (#96880)
# Summary

There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized.

I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880
Approved by: https://github.com/cpuhrsch
2023-03-17 21:28:25 +00:00
Nikita Karetnikov
bf08d1387c [primTorch] handle out in sort meta function (#96719)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96719
Approved by: https://github.com/ezyang
2023-03-16 07:38:53 +00:00
Christian Puhrsch
0a53c9624a Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)" (#96885)
Summary:
Backing out  _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885
Approved by: https://github.com/drisspg
2023-03-16 05:32:55 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Nikita Karetnikov
ec536232a3 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-14 06:51:42 +00:00
PyTorch MergeBot
be220690d9 Revert "[primTorch] add meta implementation for upsample_nearest2d_backward (#96612)"
This reverts commit fe180596b8.

Reverted https://github.com/pytorch/pytorch/pull/96612 on behalf of https://github.com/malfet due to broke lint
2023-03-13 03:07:23 +00:00
Nikita Karetnikov
fe180596b8 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-13 00:25:23 +00:00
Nikita Karetnikov
cb7c796b4b Enable min.unary_out (#96441)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96441
Approved by: https://github.com/ngimel
2023-03-11 19:23:33 +00:00
Nikita Karetnikov
0d7c44096a Add baddbmm meta function (#96548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548
Approved by: https://github.com/ezyang
2023-03-11 19:09:24 +00:00
Nikita Karetnikov
8e0d5bf538 [primTorch] add meta implementation for aten.min.dim (#96442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442
Approved by: https://github.com/ngimel
2023-03-11 18:51:51 +00:00
Driss Guessous
11aab72dc9 [SDPA] Add an optional scale kwarg (#95259)
# Summary
This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention()
The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well.

Will reduce the complexity of: #94729 and has been asked for by a couple of users.

# Review Highlights
- As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right?
- I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename.
- 'scale' is interpreted as `Q@K.T * (scale)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259
Approved by: https://github.com/cpuhrsch
2023-03-08 18:07:40 +00:00
Wonjoo Lee
3095c95828 Fixes for PyTorch/XLA functionalization integration (#94537)
Fixes for PyTorch/XLA functionalization integration

---
Some notable changes include:
- More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
- Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them
- Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
- Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
- Update MetaConverter to exclude XLA tensors in raising NotImplemented…
- Add `_propagate_xla_data` op
- Add meta tensor support for some ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537
Approved by: https://github.com/bdhirsh
2023-03-02 23:02:34 +00:00
Wanchao Liang
f397d1700f Inductor reduce_scatter_tensor (#95764)
This adds reduce_scatter to the functional collective and adds the
inductor lowering support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764
Approved by: https://github.com/kumpera
2023-03-02 22:05:30 +00:00
Will Constable
cc6da7b901 Inductor allgather_into_tensor (#95530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530
Approved by: https://github.com/kumpera
2023-02-27 21:38:36 +00:00
Christian Puhrsch
1fe2a9d122 Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)
Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339
Approved by: https://github.com/ngimel, https://github.com/jansel
2023-02-27 20:27:25 +00:00
HELSON
f43ce9553b [meta_tensor] polish error strings in meta registrations (#95052)
I found some error message should be formatted for detailed information. So I polished those error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95052
Approved by: https://github.com/bdhirsh
2023-02-27 20:12:09 +00:00