pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Michael Voznesensky	fe3ecfe0cf	Add AotAutogradFallbackTests to dynamic suite (#100454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454 Approved by: https://github.com/ezyang	2023-05-04 04:28:45 +00:00
PyTorch MergeBot	c3aa59c8f5	Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196 )" This reverts commit `32615618e4`. Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build `32615618e4` https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))	2023-05-03 01:41:56 +00:00
Natalia Gimelshein	32615618e4	[WIP] enable cuda graphs support for flash attention with dropout (#100196 ) Fixes #99905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196 Approved by: https://github.com/drisspg	2023-05-02 23:05:31 +00:00
Justin Chu	e779a30d50	[BE] Fix SIM109 `compare-with-tuple` (#100337 ) Use {replacement} instead of multiple equality comparisons Pull Request resolved: https://github.com/pytorch/pytorch/pull/100337 Approved by: https://github.com/Skylion007	2023-04-30 19:51:32 +00:00
Tugsbayasgalan Manlaibaatar	d4bf76c2a4	Persist torch.assert in aten graph (#100101 ) This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101 Approved by: https://github.com/jansel	2023-04-28 07:31:43 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Xiaodong Wang	cc01568efd	[pt2] Register meta func to randperm.default (#99593 ) Summary: Looks we're missing the meta func for randperm.default. I get complaints like this when I compile randperm with dynamic shape which I think is because it gets into the real implementation but not the meta func. ``` RuntimeError: expected int but got s0 Exception raised from expect_int at fbcode/caffe2/c10/core/SymInt.h:128 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 4 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 5 at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const # 6 at::_ops::randperm::redispatch(c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 7 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 8 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, false>::call(c10::OperatorKernel, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) ``` Differential Revision: D45137851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99593 Approved by: https://github.com/ezyang	2023-04-25 08:55:43 +00:00
Wanchao Liang	ca24a96216	minor fix to fused adam meta registration (#99436 ) This PR fixes the registration by adding `max_exp_avg_sqs` to the output shape list too, and fix some type check issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/99436 Approved by: https://github.com/mrshenli	2023-04-24 22:50:02 +00:00
Edward Z. Yang	10c938abef	Handle meta['val'] for tuple of lists. (#99724 ) Fixes https://github.com/pytorch/pytorch/issues/99356 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724 Approved by: https://github.com/wanchaol	2023-04-21 22:33:21 +00:00
Rodrigo Kumpera	38e964056b	Reland python ops (#99170 ) Waiting for the revert to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170 Approved by: https://github.com/albanD	2023-04-18 15:15:46 +00:00
PyTorch MergeBot	1c042a2137	Revert "Reland python ops (#99170 )" This reverts commit `d4de64ae8d`. Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-18 11:37:43 +00:00
Rodrigo Kumpera	d4de64ae8d	Reland python ops (#99170 ) Waiting for the revert to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170 Approved by: https://github.com/albanD	2023-04-17 21:53:41 +00:00
Nikita Karetnikov	106ccf4a2a	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 21:21:45 +00:00
PyTorch MergeBot	f957334c2b	Revert "[pt2] add meta function for `linalg.cross` (#99279 )" This reverts commit `efc3887ea5`. Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird	2023-04-17 19:33:16 +00:00
Nikita Karetnikov	efc3887ea5	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 03:05:20 +00:00
Rodrigo Kumpera	a910045add	[PATCH] Back out "Move functional collectives implementation to python. (#98595 ) (#99168 ) Summary: Original commit changeset: ba36f8751adc Original Phabricator Diff: D44788697 Test Plan: model loading is fine after reverting the diff Reviewed By: zyan0, sayitmemory Differential Revision: D44921259 --- Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168 Approved by: https://github.com/izaitsevfb	2023-04-14 23:48:19 +00:00
XiaobingSuper	9c98f2ceb7	inductor: rewrite mkldnn fx fusion using pattern_matcher(binary) (#97141 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97141 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel	2023-04-12 06:23:03 +00:00
XiaobingSuper	c214c50355	inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_unary) (#97007 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97007 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel	2023-04-12 05:52:54 +00:00
Guang Yang	c377a8590b	Add `nonzero_static()` op to pytorch to unblock export (#97417 ) Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR Example: Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])` call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])` call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded) call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated) Test Plan: Unit Tests ``` buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static' ``` PT2 Export with `nonzero_static()` Example of `GraphModule` in the exported graph ``` def forward(self, x): arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec) nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4); arg0 = None return pytree.tree_unflatten([nonzero_static_default], self._out_spec) ``` Differential Revision: D44324808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417 Approved by: https://github.com/ezyang	2023-04-11 05:13:36 +00:00
Nikita Karetnikov	b411238d76	[pt2] add meta function for `logcumsumexp` (#98683 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683 Approved by: https://github.com/ezyang	2023-04-09 01:26:37 +00:00
Rodrigo Kumpera	24d9001527	Move functional collectives implementation to python. (#98595 ) This simplifies a lot the work we need to add new ops. This relands the previous PR, not sure why it was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595 Approved by: https://github.com/wconstab	2023-04-07 21:48:05 +00:00
Nikita Karetnikov	1c226f5aad	[pt2] add meta functions for `cummax` and `cummin` (#98552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552 Approved by: https://github.com/Chillee	2023-04-07 17:58:28 +00:00
albanD	0210481dcb	Fix _like meta registrations (#98160 ) The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!). zeros_like is special due to sparse and is fixed directly by always filling it with zeros. Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions. I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal. test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160 Approved by: https://github.com/ezyang	2023-04-06 18:44:34 +00:00
PyTorch MergeBot	67d1a77086	Revert "Move functional collectives implementation to python. (#98315 )" This reverts commit `8b0374f83c`. Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace	2023-04-06 16:49:40 +00:00
Nikita Karetnikov	7b25976323	[pt2] add meta function for `take` (#98451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451 Approved by: https://github.com/ezyang	2023-04-06 14:48:35 +00:00
Rodrigo Kumpera	8b0374f83c	Move functional collectives implementation to python. (#98315 ) This simplifies a lot the work we need to add new ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315 Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze	2023-04-06 14:06:16 +00:00
PyTorch MergeBot	fa08e546f3	Revert "Add all_reduce_coalesced functional collective (#97157 )" This reverts commit `a3fc3531f5`. Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk	2023-04-04 01:50:49 +00:00
Rodrigo Kumpera	a3fc3531f5	Add all_reduce_coalesced functional collective (#97157 ) Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else. Might not work if any outputs is unused. Test code: ```python import torch import torch.distributed as dist import torch.nn.functional as F from functorch import make_fx import os import torch.distributed._functional_collectives as ft_c from torch.testing._internal.common_distributed import ( spawn_threads_and_init_comms, ) from torch._inductor.compile_fx import compile_fx_inner def my_fun(a, b): c = a * 3 tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0]) return ((tensors[1] + tensors[0] + tensors[2]).sum(), ) @spawn_threads_and_init_comms(world_size=1) def inductor_main(self): x = torch.arange(4).cuda() * (dist.get_rank() + 1) y = torch.arange(4).cuda() * (dist.get_rank() + 1) x = x.to(torch.float) y = y.to(torch.float) * 0.5 res = make_fx(my_fun)(x, y) print(f"fx graph:\n{res.graph}") ind = compile_fx_inner(res, [x, y]) print(f"inductor done:\n{ind}") os.environ["PROXY_TENSOR_TRACING"] = "1" os.environ["TORCH_COMPILE_DEBUG"] = "1" torch._dynamo.config.output_code = True if __name__ == "__main__": inductor_main(None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157 Approved by: https://github.com/fegin	2023-04-04 01:13:18 +00:00
Shen Li	e8d39606eb	[SPMD] Enable fused Adam in full train step tracing (#98113 ) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Shen Li	bccf2ef0ce	Format DTensor dispatch.py and _meta_registrations.py (#98114 ) Format-only changes with black and lintrunner to prepare for the commit on top. Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Shen Li	9ec6fdb29b	Enable adam foreach in full train step tracing (#97897 ) Main changes: 1. Registered several foreach ops to both meta and DTensor 2. Skip redundant getitem node when expanding foreach ops with DTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-03-30 16:47:10 +00:00
Shen Li	379fb47654	[SPMD] Support foreach optimizers with functionalization (#97853 ) My first attempt was to apply the same solution as how proxy_tensor.py handles other inplace ops. However, foreach is different in the way that it's schema is `native_functions.yaml` does not return anything, whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks bdhirsh for teaching me this!). As a result, the proxy output during tracing does not wrap anything, and hence we cannot correctly connect it with subsequent operators. Modifying `native_functions.yaml` is not a preferred solution. After discussing with bdhirsh, the temporary solution is to do foreach functionalization as a graph pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852 is addressed, we will switch to default functionalization. Edit: the latest version follows @bdhirsh 's suggestion on using `make_fx` `decomposition_table` instead of implementing manual fx.Graph tranforms to functionalize `_foreach_add_`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853 Approved by: https://github.com/fegin, https://github.com/wanchaol	2023-03-30 11:27:10 +00:00
Christian Puhrsch	9d37cefcb0	Resubmit _int_mm (#96685 ) Avoids any changes to gemm_and_bias Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685 Approved by: https://github.com/drisspg, https://github.com/ngimel	2023-03-27 16:14:07 +00:00
Shen Li	a8f7e0b213	[Easy] Improve error message for meta_mm (#97533 ) Differential Revision: [D44376381](https://our.internmc.facebook.com/intern/diff/D44376381) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97533 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-03-25 01:09:41 +00:00
Driss Guessous	98a5cf090d	[SDPA] Remove the chunk_grad from mem-eff attention (#96880 ) # Summary There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized. I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880 Approved by: https://github.com/cpuhrsch	2023-03-17 21:28:25 +00:00
Nikita Karetnikov	bf08d1387c	[primTorch] handle out in `sort` meta function (#96719 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96719 Approved by: https://github.com/ezyang	2023-03-16 07:38:53 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Nikita Karetnikov	ec536232a3	[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612 Approved by: https://github.com/ezyang	2023-03-14 06:51:42 +00:00
PyTorch MergeBot	be220690d9	Revert "[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 )" This reverts commit `fe180596b8`. Reverted https://github.com/pytorch/pytorch/pull/96612 on behalf of https://github.com/malfet due to broke lint	2023-03-13 03:07:23 +00:00
Nikita Karetnikov	fe180596b8	[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612 Approved by: https://github.com/ezyang	2023-03-13 00:25:23 +00:00
Nikita Karetnikov	cb7c796b4b	Enable `min.unary_out` (#96441 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96441 Approved by: https://github.com/ngimel	2023-03-11 19:23:33 +00:00
Nikita Karetnikov	0d7c44096a	Add `baddbmm` meta function (#96548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548 Approved by: https://github.com/ezyang	2023-03-11 19:09:24 +00:00
Nikita Karetnikov	8e0d5bf538	[primTorch] add meta implementation for `aten.min.dim` (#96442 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442 Approved by: https://github.com/ngimel	2023-03-11 18:51:51 +00:00
Driss Guessous	11aab72dc9	[SDPA] Add an optional scale kwarg (#95259 ) # Summary This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention() The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well. Will reduce the complexity of: #94729 and has been asked for by a couple of users. # Review Highlights - As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right? - I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename. - 'scale' is interpreted as `Q@K.T * (scale)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259 Approved by: https://github.com/cpuhrsch	2023-03-08 18:07:40 +00:00
Wonjoo Lee	3095c95828	Fixes for PyTorch/XLA functionalization integration (#94537 ) Fixes for PyTorch/XLA functionalization integration --- Some notable changes include: - More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output - Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them - Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns - Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see. - Update MetaConverter to exclude XLA tensors in raising NotImplemented… - Add `_propagate_xla_data` op - Add meta tensor support for some ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537 Approved by: https://github.com/bdhirsh	2023-03-02 23:02:34 +00:00
Wanchao Liang	f397d1700f	Inductor reduce_scatter_tensor (#95764 ) This adds reduce_scatter to the functional collective and adds the inductor lowering support Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764 Approved by: https://github.com/kumpera	2023-03-02 22:05:30 +00:00
Will Constable	cc6da7b901	Inductor allgather_into_tensor (#95530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530 Approved by: https://github.com/kumpera	2023-02-27 21:38:36 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
HELSON	f43ce9553b	[meta_tensor] polish error strings in meta registrations (#95052 ) I found some error message should be formatted for detailed information. So I polished those error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95052 Approved by: https://github.com/bdhirsh	2023-02-27 20:12:09 +00:00

1 2 3 4

195 Commits