pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bin Bao	282dfe8ba4	[inductor][Reland] Use decomposition for _to_copy (#90494 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90494 Approved by: https://github.com/ngimel	2022-12-09 16:51:50 +00:00
PyTorch MergeBot	e89685b0b5	Revert "[inductor] Use decomposition for _to_copy (#90314 )" This reverts commit `3fdb5f2dda`. Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert	2022-12-08 18:29:06 +00:00
Bin Bao	d2ee94231e	[inductor] Fallback for index with None in the middle of indices (#90022 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90022 Approved by: https://github.com/ngimel	2022-12-08 16:18:57 +00:00
Bin Bao	3fdb5f2dda	[inductor] Use decomposition for _to_copy (#90314 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314 Approved by: https://github.com/ngimel	2022-12-08 15:25:44 +00:00
Bin Bao	d7c30e11c6	[inductor] Remove .to from lowering (#90280 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90280 Approved by: https://github.com/ngimel	2022-12-08 00:40:41 +00:00
Peter Bell	e6a7278753	Give std/var correction overloads proper defaults (#56398 ) The correction overloads defaults were left off for forward compatibility reasons, but this FC window expired well over a year ago at this point. Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398 Approved by: https://github.com/mruberry	2022-12-07 15:15:00 +00:00
Peter Bell	4f44877983	[Inductor] Add test for Scheduler fusions (#90014 ) Currently there is `test_vertical_fusion1` which fuses entirely during the lowering stage and no buffers are realized. This adds `test_scheduler_vertical_fusion1` which is the same test but with several intermediate calculations realized so the scheduler is left to do the fusion. To support the test, this PR also adds: - `metrics.ir_nodes_pre_fusion` which when compared with `generated_kernel_count` tells us how many nodes were fused. - `torch._test_inductor_realize` which is an identity operator in eager, but under inductor also forces the input to be realized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90014 Approved by: https://github.com/jansel	2022-12-07 01:33:25 +00:00
Elias Ellison	6addc8d923	[Inductor] add expm1 lowering (#89961 ) Improves perf of inductor no-cudagraphs on nvidia-deeprecommender from 0.88 -> .96. I am looking into disabling implicit fallbacks for benchmark models in another pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89961 Approved by: https://github.com/ngimel	2022-12-02 04:29:54 +00:00
XiaobingSuper	42f27c322b	TorchDynamo: don't compute index for max_pooling when return_index is false (#89838 ) For max_pooling, if return_index is False, we don't need compute the index. Before: ``` extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0) { #pragma GCC ivdep for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<3; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3; i2+=1) { #pragma GCC ivdep for(long i3=0; i3<3; i3+=1) { { { auto tmp0 = in_ptr0[i3 + (6i2) + (42i1) + (147i0)]; auto tmp2 = in_ptr0[3 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp7 = in_ptr0[6 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp12 = in_ptr0[21 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp17 = in_ptr0[24 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp22 = in_ptr0[27 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp27 = in_ptr0[42 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp32 = in_ptr0[45 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp37 = in_ptr0[48 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp1 = static_cast<long>((2i2) + (14i1)); auto tmp3 = static_cast<long>(1 + (2i2) + (14i1)); auto tmp4 = tmp2 > tmp0; auto tmp5 = tmp4 ? tmp3 : tmp1; auto tmp6 = (tmp0 != tmp0) ? tmp0 : std::max(tmp2, tmp0); auto tmp8 = static_cast<long>(2 + (2i2) + (14i1)); auto tmp9 = tmp7 > tmp6; auto tmp10 = tmp9 ? tmp8 : tmp5; auto tmp11 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); auto tmp13 = static_cast<long>(7 + (2i2) + (14i1)); auto tmp14 = tmp12 > tmp11; auto tmp15 = tmp14 ? tmp13 : tmp10; auto tmp16 = (tmp11 != tmp11) ? tmp11 : std::max(tmp12, tmp11); auto tmp18 = static_cast<long>(8 + (2i2) + (14i1)); auto tmp19 = tmp17 > tmp16; auto tmp20 = tmp19 ? tmp18 : tmp15; auto tmp21 = (tmp16 != tmp16) ? tmp16 : std::max(tmp17, tmp16); auto tmp23 = static_cast<long>(9 + (2i2) + (14i1)); auto tmp24 = tmp22 > tmp21; auto tmp25 = tmp24 ? tmp23 : tmp20; auto tmp26 = (tmp21 != tmp21) ? tmp21 : std::max(tmp22, tmp21); auto tmp28 = static_cast<long>(14 + (2i2) + (14i1)); auto tmp29 = tmp27 > tmp26; auto tmp30 = tmp29 ? tmp28 : tmp25; auto tmp31 = (tmp26 != tmp26) ? tmp26 : std::max(tmp27, tmp26); auto tmp33 = static_cast<long>(15 + (2i2) + (14i1)); auto tmp34 = tmp32 > tmp31; auto tmp35 = tmp34 ? tmp33 : tmp30; auto tmp36 = (tmp31 != tmp31) ? tmp31 : std::max(tmp32, tmp31); auto tmp38 = static_cast<long>(16 + (2i2) + (14i1)); auto tmp39 = tmp37 > tmp36; auto tmp40 = tmp39 ? tmp38 : tmp35; auto tmp41 = (tmp36 != tmp36) ? tmp36 : std::max(tmp37, tmp36); out_ptr0[i3 + (3i2) + (9i1) + (27i0)] = tmp41; } } } } } } } ''') ``` After: ``` extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0) { #pragma GCC ivdep for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<3; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3; i2+=1) { #pragma GCC ivdep for(long i3=0; i3<3; i3+=1) { { { auto tmp0 = in_ptr0[i3 + (6i2) + (42i1) + (147i0)]; auto tmp1 = in_ptr0[3 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp3 = in_ptr0[6 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp5 = in_ptr0[21 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp7 = in_ptr0[24 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp9 = in_ptr0[27 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp11 = in_ptr0[42 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp13 = in_ptr0[45 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp15 = in_ptr0[48 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); out_ptr0[i3 + (3i2) + (9i1) + (27i0)] = tmp16; } } } } } } } ''') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89838 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-12-02 04:15:45 +00:00
Elias Ellison	275ade6371	Enable rsqrt (#89771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89771 Approved by: https://github.com/anijain2305	2022-11-30 02:08:13 +00:00
XiaobingSuper	0c4f3db7bf	TorchDynamo: weight prepack for mkl linear (#89109 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-25 01:20:19 +00:00
Natalia Gimelshein	a188f05e8c	Reland #89031 Added conv constraint that infers layouts (#89530 ) Relands #89031 Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530 Approved by: https://github.com/Chillee	2022-11-23 20:18:54 +00:00
Animesh Jain	82713a1cc4	[inductor][compilation time] Fallback when kernel size for avg/max pool is large (#89448 ) This fixes compilation time for yolov3 from 400 seconds to 48 seconds. yolov3 has a 13x13 max_pool2d kernel, which was creating really large Triton code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89448 Approved by: https://github.com/ngimel	2022-11-22 02:23:24 +00:00
Animesh Jain	120d200620	Revert "Added conv constraint that infers layouts (#89031 )" (#89451 ) This reverts commit `716f70f19a`. Fixes performance regression and compilation latency increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89451 Approved by: https://github.com/soumith, https://github.com/jansel	2022-11-22 02:20:50 +00:00
Peter Bell	c068fa900f	[inductor] Misc division lowering fixes (#88603 ) 1. `aten.div.Tensor_mode` should allow broadcasting 2. `div` can use `ELEMENTWISE_TYPE_PROMOTION_KIND.INT_TO_FLOAT` 3. `prims.div` on integers should be truncating division 4. Add lowering for `true_divide` which is aliased to `div` 5. register lowering for inplace version of `div_mode` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88603 Approved by: https://github.com/ngimel	2022-11-21 20:56:41 +00:00
Natalia Gimelshein	51e961dd7b	use std/libdevice erf in inductor (#89388 ) By itself, libdevice version of erf has the same perf as our decomposition, but in real workloads it leads to better fusion groups (due to fewer ops in the fused kernel). Bonus: a few fp64 test skips removed, because our decomposition wasn't accurate enough for fp64, but libdevice version is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89388 Approved by: https://github.com/jansel	2022-11-21 00:58:03 +00:00
Horace He	716f70f19a	Added conv constraint that infers layouts (#89031 ) The core problem that we often have with contiguous/channels-last layouts and convolutions is that Inductor often doesn't do a great job of "preserving" the eager-mode layouts. So, for example, we'll often have something like ``` a: channels-last b = foo(a) c = convolution(a) ``` In eager-mode, `a` would stay channels-last, and we would avoid two transpose copies (one into NHWC and one back into NCHW) within the convolution kernel. However, Inductor currently sometimes loses the "correct" layout of `b` (not in this simple example, but others). Then, not only will we do a transpose within `foo`, but we'll then immediately transpose it back to do the convolution (and then again once the convolution is done). This is particularly egregious in `convnext_base`, where there's a lot of mixing of non-channels last tensors and channels-last tensors. The solution in this PR is to constrain the inputs to `aten.convolution`/`aten.convolution_backward` to match the layouts from eager-mode. This ensures that we'll never do extra transposes within `aten.convolution`, which are particularly bad (since Inductor can't fuse them). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89031 Approved by: https://github.com/ngimel, https://github.com/jansel	2022-11-17 01:52:35 +00:00
Nikolay Korovaiko	8506b305df	handle scatter(Scalar) overload in inductor (#88894 ) Relanding https://github.com/pytorch/pytorch/pull/88210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88894 Approved by: https://github.com/desertfire	2022-11-17 00:38:47 +00:00
XiaobingSuper	4ad7b17fab	TorchDynamo: Add convolution binary(inplace) fusion for cpu in inference mode (#88403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88403 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-14 08:42:40 +00:00
XiaobingSuper	3e43ff2794	torchdynamo: add convolution add(relu) inplace fusion kernel (#88048 ) This PR is about add convolution add(relu) inplace fusion kernel which works for other.add_(conv). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88048 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-10 13:54:37 +00:00
Elias Ellison	2381548071	add stride constraints to fallbacks (#88534 ) Add stride/contiguity constraints to fallbacks so that inputs will be in the right stride permutation for the fallback kernel. Improves perf of coat_lite_mini from 1.48415536054865 -> 2.010956856330101. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88534 Approved by: https://github.com/ngimel	2022-11-10 01:13:44 +00:00
Bin Bao	f11f0e4a03	[inductor] Handle nested tuple/list output in fallback kernel (#88495 ) Summary: Currently fallback kernel in inductor assumes its output is either a tensor or a tuple/list of tensors. This PR makes it handle more generic output data structure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88495 Approved by: https://github.com/jansel	2022-11-09 15:50:45 +00:00
Fabio Rocha	652af5ec15	upsample_*.vec ops are now CompositeImplicit (#85638 ) It was previously CompositeExplicit but it was not really necessary. See discussion in https://github.com/pytorch/pytorch/issues/85405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638 Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel	2022-11-09 09:58:04 +00:00
Peter Bell	8e2627d42f	[inductor] Fix aten.fmod lowering (#88602 ) Currently the lowering for aten.fmod promotes integral types to float and calls `tl.libdevice.fmod` whereas the ATen behavior is to use the modulo operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88602 Approved by: https://github.com/jansel	2022-11-08 20:27:36 +00:00
Natalia Gimelshein	53ca5ad347	enable scalar reduction with dim=-1 (#88628 ) Tested with all samples for `sum`, but also fixes all samples errors on other reductions (amin, amax, any, all etc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88628 Approved by: https://github.com/desertfire	2022-11-08 17:06:28 +00:00
PyTorch MergeBot	b00c43b310	Revert "fallback for scatter_(scalar) (#88210 )" This reverts commit `896fa8c5c9`. Reverted https://github.com/pytorch/pytorch/pull/88210 on behalf of https://github.com/suo due to this broke inductor tests, see: `896fa8c5c9`	2022-11-07 22:29:56 +00:00
Nikolay Korovaiko	896fa8c5c9	fallback for scatter_(scalar) (#88210 ) `scatter_reduce_` overloads can only accept `Tensor src`. `scatter_`, on the other hand, can accept `Number src`. Switching a fallback from `scatter_reduce_` to `scatter_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88210 Approved by: https://github.com/desertfire	2022-11-07 21:25:55 +00:00
Peter Bell	791d9ee253	[inductor] Add lowering for as_strided_scatter (#88379 ) Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88379 Approved by: https://github.com/jansel	2022-11-07 00:59:29 +00:00
Bin Bao	955cbe610b	[inductor] Handle the case where kwargs contains tensor (#88417 ) Summary: Fix https://github.com/pytorch/torchdynamo/issues/1805; currently inductor does not allow any tensor in kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88417 Approved by: https://github.com/ngimel	2022-11-04 20:29:03 +00:00
XiaobingSuper	71f793d312	TorchDynamo: Add linear binary fusion for cpu in BF16 inference mode (#87066 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87066 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-04 02:40:29 +00:00
XiaobingSuper	e4efea4f14	TorchDynamo: Add linear unary fusion for cpu in BF16 inference mode (#87065 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87065 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-04 01:26:08 +00:00
XiaobingSuper	52173188ef	TorchDynamo: Add convolution binary fusion for cpu in inference mode (#87064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87064 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-04 01:10:05 +00:00
Nikolay Korovaiko	002dad35f4	better error message for out= ops (#88367 ) In cases where a tensor kwarg is actually "out=", the following error message would look nicer than this : ``` Traceback (most recent call last): File "/fsx/users/binbao/pytorch/torch/_inductor/graph.py", line 241, in call_function out = lowerings[target](args, *kwargs) File "/fsx/users/binbao/pytorch/torch/_inductor/lowering.py", line 168, in wrapped assert not any(isinstance(x, TensorBox) for x in kwargs.values()) AssertionError ``` https://github.com/pytorch/torchdynamo/issues/1798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88367 Approved by: https://github.com/desertfire	2022-11-03 16:20:14 +00:00
PyTorch MergeBot	a8561c4571	Revert "[inductor] Handle the case where kwargs contains tensor (#88215 )" This reverts commit `983c0e7f31`. Reverted https://github.com/pytorch/pytorch/pull/88215 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think it breaks trunk https://github.com/pytorch/pytorch/actions/runs/3380662072/jobs/5613987333 with a failure in test_torchinductor_opinfo.py	2022-11-02 23:33:15 +00:00
Bin Bao	983c0e7f31	[inductor] Handle the case where kwargs contains tensor (#88215 ) Summary: Fix https://github.com/pytorch/torchdynamo/issues/1805; currently inductor does not allow any tensor in kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88215 Approved by: https://github.com/ngimel	2022-11-02 19:50:16 +00:00
Natalia Gimelshein	1bc0e923bb	add special case for power of 0.5 (#87912 ) Workaround for https://github.com/pytorch/torchdynamo/issues/1775, and calling sqrt is better in any case, but `libdevice.pow` still for some reason doesn't work if both arguments are scalars cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @mreso, can you please check if that takes you further with diffusers cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87912 Approved by: https://github.com/desertfire	2022-10-28 16:09:25 +00:00
XiaobingSuper	c36db82e12	TorchDynamo: Add convolution unary fusion for cpu in inference mode (#87063 ) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87063 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-10-27 06:55:32 +00:00
Jason Ansel	707218f125	Reland #87025 and fix periodic tests (#87084 ) - Relands #87025 - disables failing tests related to https://github.com/pytorch/torchdynamo/issues/1697 - Reverts `d01eea6027` cc @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87084 Approved by: https://github.com/malfet, https://github.com/voznesenskym	2022-10-22 03:18:17 +00:00
Natalia Gimelshein	6775c3e19d	fix 0d cpu tensor handling when it's the first arg (#87273 ) Fixes https://github.com/pytorch/torchdynamo/issues/1681 When at least one of the pw args is on cuda, set device to cuda. We assume that cases of true device mismatch have been already weeded out during tracing, and what we have is 0d cpu tensor + cuda tensor interop. Also fix 0d tensor test that previously wasn't compiling with dynamo. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87273 Approved by: https://github.com/soumith, https://github.com/voznesenskym	2022-10-19 16:55:27 +00:00
XiaobingSuper	232fbd90ff	[TorchDynamo]: fused bias for cpu convolution path (#87050 ) For aten.convolution CPU path, the bias always can be fused, so this PR adds a device check: if inputs' device is CPU, we will fuse it for a good performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87050 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-10-19 07:13:38 +00:00
Horace He	2418ddb1ec	Unified symbolic shape variables between Inductor and AOTDispatcher (#87161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87161 Approved by: https://github.com/jansel	2022-10-19 04:50:34 +00:00
Fabio Rocha	e4285f09b9	[inductor] new way to compile f64 libdevice calls (#87189 ) Porting over [torchdynamo/#1633](https://github.com/pytorch/torchdynamo/pull/1633) `torch/_inductor/codegen/triton.py` now defines `libdevice_<function>` variants of some functions. You can request dispatch to those for float64 dtypes when using `register_pointwise` by setting `use_libdevice_for_f64=True`. Other minor changes: - In triton, sigmoid now codegens tl.sigmoid - silu now comes from decomp, not lowering - Some test skips no longer necessary, removed or made xfails Switching to `tl.sigmoid` has exactly same performance. Moving `silu` to decomp does not change anything, same triton code is generated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87189 Approved by: https://github.com/ngimel	2022-10-18 19:13:11 +00:00
Horace He	adc7ee09dc	Added upsample_nearest3d/1d lowering to inductor (#87158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87158 Approved by: https://github.com/ngimel	2022-10-18 18:27:56 +00:00
Zachary DeVito	d36c284d14	[triton] allow cuda properties to be queried from workers (#87101 ) Fixes https://github.com/pytorch/pytorch/pull/87048 by saving the needed properties before fork. Actually attempting to get CUDA to load in the workers is probably not desired: cuda initialization takes O(seconds). Having multiple processes using the same device will slow things down. This just moves the needed properties from the main trainer process to the workers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87101 Approved by: https://github.com/soumith	2022-10-18 04:48:29 +00:00
PyTorch MergeBot	2c6167c4bb	Revert "[inductor] Use decomps for unfold (#87025 )" This reverts commit `5099883f05`. Reverted https://github.com/pytorch/pytorch/pull/87025 on behalf of https://github.com/ZainRizvi due to Breaks periodic tests	2022-10-17 15:44:15 +00:00
Jason Ansel	5099883f05	[inductor] Use decomps for unfold (#87025 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87025 Approved by: https://github.com/soumith	2022-10-16 17:10:33 +00:00
Jason Ansel	054a2fd6c2	Sync changes from `pytorch/torchdynamo` (#87013 ) This updates to: `6380959be2` Generated with: https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013 Approved by: https://github.com/voznesenskym	2022-10-15 21:00:57 +00:00
Jason Ansel	8f71e8de7e	Sync changes from pytorch/torchdynamo, enable tests (#86950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950 Approved by: https://github.com/Chillee	2022-10-14 23:08:58 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00

49 Commits