pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Michael Voznesensky	3b9a386d48	Add `TORCH_FAKE_TENSOR_DEBUG` use it to enable storage of traces on fake tensors at init time (#90215 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90215 Approved by: https://github.com/ezyang	2022-12-06 22:28:52 +00:00
William Wen	d224ac7f77	Remove logging.CODE (#90234 ) Fixes https://github.com/pytorch/torchdynamo/issues/1932 Discussed with @mlazos: if we still want to separate streams for code logging and the rest of info, we can use a separate logger object with a unique name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90234 Approved by: https://github.com/ezyang	2022-12-06 22:24:43 +00:00
Sergii Dymchenko	14894a7311	Remove non-existing parameter from docstring (#90163 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90163 Approved by: https://github.com/clee2000	2022-12-06 22:22:17 +00:00
Yanbo Liang	7e9a8a1361	Disable dynamo tracing torchrec.distributed (#90087 ) Summary: Context at T138318923 Test Plan: mannual test Reviewed By: yf225 Differential Revision: D41631076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90087 Approved by: https://github.com/yf225	2022-12-06 22:17:16 +00:00
Eli Uriegas	27ad2605c8	Hotfix to unblock TRT unit tests internally (#90313 ) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Export of [D41778303](https://www.internalfb.com/diff/D41778303) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90313 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-06 22:14:37 +00:00
eqy	62e450d55f	[CUDA Graphs] Add option to dump a captured graph for debugging (#85519 ) CC @xwang233 @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85519 Approved by: https://github.com/ngimel	2022-12-06 22:03:05 +00:00
fduwjj	1abe264ef0	[Upstream _NamedOptimzer] Reland PR (89480) (#90293 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): Reland https://github.com/pytorch/pytorch/pull/89480/ * #90294 * __->__ #90293 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90293 Approved by: https://github.com/awgu	2022-12-06 21:47:12 +00:00
Andrew Gu	7436b19eb2	[FSDP] Clarify loss dtype check in `_test_fsdp_parity` (#90251 ) A recent PR deprecated `torch.testing.assert_allclose` in favor of `torch.testing.assert_close` and left a `TODO`. This PR follows up to confirm that we do intend to have `check_dtype=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90251 Approved by: https://github.com/rohan-varma	2022-12-06 21:28:40 +00:00
Andrew Gu	919e09f26a	[FSDP][BE] Clean up dead code from `clip_grad_norm_()` testing (#90250 ) `FSDP.clip_grad_norm_()` is tested separately in `test_fsdp_clip_grad_norm.py`. This PR removes the dead non-run code from `common_fsdp.py` and `test_fsdp_core.py` related to `clip_grad_norm_()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90250 Approved by: https://github.com/rohan-varma	2022-12-06 21:28:40 +00:00
Andrew Gu	3b578edd04	[FSDP] Test `use_orig_params=True` in `test_fsdp_ignored_modules.py` (#90290 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90290 Approved by: https://github.com/zhaojuanmao	2022-12-06 21:28:40 +00:00
Yanbo Liang	25f39c1bce	Fix uniform ref implementation (#90094 ) Fixes https://github.com/pytorch/torchdynamo/issues/1954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094 Approved by: https://github.com/ngimel	2022-12-06 21:28:17 +00:00
Edward Z. Yang	a1ab06ab65	ShapeEnv.create_symbolic_sizes_strides_storage_offset (#89962 ) Instead of having storage offset hang out on its own, allocate all of these symbols all in one go. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89962 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2022-12-06 21:27:02 +00:00
Charlie Yan	e818c36647	reland #89222 : [Composable API] replicate: change to per module call, remove mark_root_module() (#90254 ) reland https://github.com/pytorch/pytorch/pull/89222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90254 Approved by: https://github.com/zhaojuanmao	2022-12-06 21:17:53 +00:00
Andrew Gu	bd9ad89a6d	[FSDP] Fix accidental change in `_test_fsdp_parity` (#90252 ) I accidentally changed the semantics of this line when refactoring a while ago. The [previous version](https://github.com/pytorch/pytorch/pull/80873/files#diff-7b5c66f99161fa6a3d9042e80f8c8cc140a64e43445feede46f55e53154f6c3dL635) used to say: ``` if not mixed_precision: ``` which is actually the opposite of ``` if mixed_precision is not None: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90252 Approved by: https://github.com/zhaojuanmao	2022-12-06 20:13:21 +00:00
mfkasim1	ce21262808	Log1p for complex in CPU (#89691 ) Another PR for https://github.com/pytorch/pytorch/issues/89205: making torch.log1p accepts complex numbers in CPU. I haven't done the GPU version because I'm not sure which file(s) to change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89691 Approved by: https://github.com/jgong5, https://github.com/lezcano	2022-12-06 19:12:24 +00:00
Wanchao Liang	9e314bd822	[dtensor] handle the case where output of op is Optional[Tensor] (#90241 ) Observed by @aazzolini, some op might have Optional[Tensor] returns where it return None (i.e. native_layer_norm_backward), it's a mismatch between C++ aten op signature and python None, but we need to handle it in the python side Pull Request resolved: https://github.com/pytorch/pytorch/pull/90241 Approved by: https://github.com/aazzolini	2022-12-06 18:17:20 +00:00
Edward Z. Yang	eace084815	Use Sized not Iterable to test for len (#90182 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90182 Approved by: https://github.com/albanD	2022-12-06 16:13:14 +00:00
mingfeima	c6942dbbfb	add shape check for random_samples in fractional_max_pool{2d\|3d} (#89992 ) This PR add shape checks for `random_samples` in fractional_max_pool2d and fractional_max_pool3d., to provide more meaningful warnings instead of SegFault when the input is illegal. For more details, please check https://github.com/pytorch/pytorch/issues/89648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89992 Approved by: https://github.com/jgong5, https://github.com/ezyang	2022-12-06 14:14:41 +00:00
mikey dagitses	be5108d5f9	replace memset with value-initialization (#90048 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/90048). * #89865 * #89852 * #89851 * __->__ #90048 replace memset with value-initialization Summary: This is equivalent to zero initialization for any members that are scalar or have implicit default constructors. Note that aside from the reset at the beginning, blockmask and philox_args are not touched by this function. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90048 Approved by: https://github.com/drisspg, https://github.com/malfet	2022-12-06 13:48:05 +00:00
Xia, Weiwen	97e47a52b8	[Quant] Add fused linear-leaky_relu op for onednn backend (#88478 ) Summary Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. Test Plan python test_quantization.py TestQuantizedLinear Pull Request resolved: https://github.com/pytorch/pytorch/pull/88478 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-06 08:32:59 +00:00
AllenTiTaiWang	41bfa49db9	[ONNX] Add src/index dynamic axes support for aten::scatter_add (#90090 ) Extend from #89787 , and answer from https://github.com/onnx/onnx/issues/4672, dynamically catching shape of index can let converter further support on this op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90090 Approved by: https://github.com/BowenBao	2022-12-06 07:56:20 +00:00
PyTorch MergeBot	176b962f4b	Revert "[PT-D][Composability][1/N] Upstream NamedOptimizer from TorchRec (KeyedOptimizer in TR) (#89480 )" This reverts commit `31ec1a1ef7`. Reverted https://github.com/pytorch/pytorch/pull/89480 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names	2022-12-06 07:22:37 +00:00
Ryan Spring	3c9431f505	Add factory functions to python frontend (#89230 ) - Add `full` nvprim to support factory functions because the full reference uses `empty` and `fill` while we have a full factory function. - Change `full_like` reference to call `full` to avoid defining another nvprim. - Enable support for new_zeros to enable `cudnn_batch_norm` decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89230 Approved by: https://github.com/kevinstephano, https://github.com/mruberry	2022-12-06 07:16:21 +00:00
PyTorch MergeBot	e645771e95	Revert "as_strided: Fix default storage_offset for reference implementation (#89513 )" This reverts commit `ba70a8be03`. Reverted https://github.com/pytorch/pytorch/pull/89513 on behalf of https://github.com/kit1980 due to Broke multiple workflows, 2 unexpected successes for autograd tests	2022-12-06 07:14:16 +00:00
Arek Sredzki	44dac51c36	Improve Autograd Documentation Clarity (#89401 ) This makes minor adjustments to the autograd docs, improving clarity and resolving grammatical errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89401 Approved by: https://github.com/kit1980	2022-12-06 06:45:04 +00:00
Manuel Candales	49ccc41d57	[Vulkan] Enable QInt8 and QInt32 quantization (#89788 ) Summary: Enabled Vulkan quantization for dtypes QInt8 and QInt32 Test Plan: On Mac ``` cd ~/fbsource buck1 run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck1 build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Differential Revision: D41561661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89788 Approved by: https://github.com/digantdesai	2022-12-06 06:27:40 +00:00
Andrew Gu	45b40be078	[FSDP()] Fix `fully_shard` fwd hook registration (#90201 ) I need to rebase later after Shen's PRs land. The idea is to only register the pre/post-forward hook on the _root modules_ among the modules that consume a `FlatParameter`. (Yes, the term _root module_ is heavily overloaded. We may want to clarify that at some point. Here, _root_ is being used in the graph sense, meaning parent-less, and the scope is only among the modules consuming a `FlatParameter`.) This avoids unnecessary pre/post-forward hooks running, which would lead to errors because the unshard is not truly idempotent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90201 Approved by: https://github.com/mrshenli, https://github.com/rohan-varma	2022-12-06 06:09:03 +00:00
Sean Ross-Ross	2b7fcfa399	fix: Moving operators to FuncTorchBatchedDecomposition (#89762 ) Some of the easy to move operators I've moved over and removed an xfail. I found this from the test that I implemented in https://github.com/pytorch/pytorch/pull/89465 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89762 Approved by: https://github.com/zou3519	2022-12-06 05:59:47 +00:00
Sean Ross-Ross	bb673fb1d9	fix: update error when tensor escapes vmap (#89077 ) Fixes https://github.com/pytorch/functorch/issues/1054 @zou3519, I played around with it, but I am unsure of how to repro the cases for gen_vmap_inplace_plumbing and below in gen_vmap_plumbing_no_returns I've also seen that there are 24 other instances of the `TORCH_INTERNAL_ASSERT(maybe_layer.has_value());` assert, should I change all of these and add tests? Pull Request resolved: https://github.com/pytorch/pytorch/pull/89077 Approved by: https://github.com/zou3519	2022-12-06 05:52:09 +00:00
Wanchao Liang	2c2cce73d4	[dtensor] remove torchgen function schema and parse manually (#90106 ) This PR get rids of torchgen FunctionSchema parsing and parse it manually, it should resolve torchgen package issue and also provide some perf wins when running DTensor eagerly Pull Request resolved: https://github.com/pytorch/pytorch/pull/90106 Approved by: https://github.com/awgu	2022-12-06 05:45:00 +00:00
Yanli Zhao	a0c7b88861	remove backward hook in memory_tracker (#90143 ) remove backward hook in memory_tracker, as it does not work well with jagged tensor in some cases, it is OK to remove this hook for now as it does not really track any stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/90143 Approved by: https://github.com/rohan-varma	2022-12-06 05:39:59 +00:00
Sergii Dymchenko	6bbcd025bd	Fix issue 38095 TODO in onnx/test_utility_funs.py (#90085 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90085 Approved by: https://github.com/BowenBao	2022-12-06 05:29:50 +00:00
Masaki Kozuki	508916128d	[ReduceOp] ameliorate custom `__eq__` (#90088 ) Improve the completeness of `ReduceOp.__eq__`. Should support the equal operator with the first argument of `RedOpType` and the second of `ReduceOp` in a follow-up. Fixes #90072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90088 Approved by: https://github.com/kwen2501	2022-12-06 05:13:50 +00:00
Michael Lazos	2d9267ba30	[dynamo] Rewrite addcdiv in dynamo to its constituent ops (#90227 ) This avoids a graph break when `value` is used. This fixes a graph break in the variants of Adam and Adagrad optimizers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90227 Approved by: https://github.com/jansel	2022-12-06 05:08:44 +00:00
Ram Rachum	77f9b2e8bf	Fix exception causes in fx, nn and onnx packages (#90134 ) This is a continuation of #90118 @kit1980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90134 Approved by: https://github.com/kit1980	2022-12-06 04:34:58 +00:00
fduwjj	31ec1a1ef7	[PT-D][Composability][1/N] Upstream NamedOptimizer from TorchRec (KeyedOptimizer in TR) (#89480 ) In pytorch, the optim state_dict will always use number to index optimizer state_dict for parameters. Now composability workstream need a FQN based way to index optimizer state_dict for parameters.. For example, SGD optimizer might have something in its `state_dict` like: ``` {'state': {0: {'momentum_buffer': tensor(...)}, {1: {'momentum_buffer': tensor(...)}, ... } 'param_groups': [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1, 2, 3, 4, 5, 6, 7]}] } ``` And in NamedOptimizer we want the `state_dict` can be: ``` {'state': {'net1.0.weight': {'momentum_buffer': tensor(...)}, {'net1.0.bias': {'momentum_buffer': tensor(...)}, ... } 'param_groups': [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': ['net1.0.weight', 'net1.0.bias', 'net2.0.weight', 'net2.0.bias', 'net3.weight', 'net3.bias', 'net4.1.weight', 'net4.1.bias']}] } ``` We also want to support load_state_dict to enable optim `state_dict` override for NameOptimizer. For the next couple PR/diffs, we also need to: 1. To make `NamedOptimizer` working with FSDP (like registering a hook for model wrapped with FSDP) and other PTD/PT components. 2. Make `NamedOptimizer` works well with apply_optim_in_backward 3. Upstream also `CombinedOptimizer`. Differential Revision: [D41432088](https://our.internmc.facebook.com/intern/diff/D41432088/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41432088/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/89480 Approved by: https://github.com/rohan-varma	2022-12-06 04:34:19 +00:00
HDCharles	cee396fa07	[ao][ns] PNP demo for exposing arbitrary model transforms (#90153 ) adding way to use arbitrary prepare and convert functions with PNP. note this is a recreation of https://github.com/pytorch/pytorch/pull/89892 which was reverted due to landing not syncing between github and fbcode python test/test_quantization.py TestFxNumericSuiteNShadows.test_custom_functions_and_tracer Differential Revision: [D41723892](https://our.internmc.facebook.com/intern/diff/D41723892/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90153 Approved by: https://github.com/vkuzo	2022-12-06 04:24:54 +00:00
Sherlock Huang	42705bd7b3	Disallow registering meta function for CompositeImplicitAutograd ops (#90222 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90222 Approved by: https://github.com/ezyang	2022-12-06 04:22:31 +00:00
Natalia Gimelshein	a88400e0cc	pad low precision matmuls when requested (#90235 ) Matmul padding is beneficial not only for fp32, fp16/bf16 with amp can benefit as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90235 Approved by: https://github.com/jiawenliu64	2022-12-06 04:13:24 +00:00
Peter Bell	ba70a8be03	as_strided: Fix default storage_offset for reference implementation (#89513 ) This fixes the default storage_offset to take it from the input. This was previously untested, so I've also added a new OpInfo which includes samples with non-zero storage_offsets on the input tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89513 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-12-06 04:07:16 +00:00
Danni Li	05ccbd6d94	Functionalization: skip meta block computation if compute_reference_meta is false (#90219 ) Skip computing meta block when `compute_reference_meta` is `False`. Issue: #89914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90219 Approved by: https://github.com/ezyang	2022-12-06 04:03:01 +00:00
Edward Z. Yang	962ebe88a2	Assert there are no outstanding side effects before calling cond (#90208 ) The current cond implementation is silently incorrect when there are outstanding side effects, since the locally tracked side effects are lost when the recursive export call is made. At least we raise an assert now. I'm working on a refactor of cond which should be able to sidestep this problem. Maybe. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D41746973](https://our.internmc.facebook.com/intern/diff/D41746973) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90208 Approved by: https://github.com/voznesenskym	2022-12-06 03:53:48 +00:00
PyTorch MergeBot	0d8e53dfe7	Revert "[Composable API] `replicate`: change to per module call, remove `mark_root_module()` (#89222 )" This reverts commit `65a0dcffd8`. Reverted https://github.com/pytorch/pytorch/pull/89222 on behalf of https://github.com/malfet due to Included unintended submodule updates	2022-12-06 03:26:28 +00:00
PyTorch MergeBot	73565ce320	[vision hash update] update the pinned vision hash (#90239 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90239 Approved by: https://github.com/pytorchbot	2022-12-06 03:25:17 +00:00
PyTorch MergeBot	3749b9dc73	Revert "[Composable API] `replicate`: add support for DDP args (#89243 )" This reverts commit `0f274ed385`. Reverted https://github.com/pytorch/pytorch/pull/89243 on behalf of https://github.com/malfet due to Depends on https://github.com/pytorch/pytorch/pull/89222 that introduced spurious module updates	2022-12-06 03:22:18 +00:00
XiaobingSuper	2597d5d722	TorchDynamo: always convert flexiblelayout to be FixedLayout when given a stride_order (#89904 ) For convolution, we always call require_stride_order to convert the input to the target stride order, if the original input's layout is flexiblelayout, there always have a memory copy because the is_stride_order_storage_and_layout only checks the init stride order, I think for flexiblelayout, means it's layout can be changed, if the user gives a stride order, I think we always need to convert the flexiblelayout to be FixedLayout using given strider order. Given a CV user case, the max_pooling's output is used by two convolutions, there has two memory copies: ``` kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0, float* __restrict__ out_ptr1, float* __restrict__ out_ptr2) { #pragma GCC ivdep for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<3; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3; i2+=1) { #pragma GCC ivdep for(long i3=0; i3<3; i3+=1) { { { auto tmp0 = in_ptr0[i3 + (6i2) + (42i1) + (147i0)]; auto tmp1 = in_ptr0[3 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp3 = in_ptr0[6 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp5 = in_ptr0[21 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp7 = in_ptr0[24 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp9 = in_ptr0[27 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp11 = in_ptr0[42 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp13 = in_ptr0[45 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp15 = in_ptr0[48 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); out_ptr0[i3 + (3i2) + (9i1) + (27i0)] = tmp16; } } } } } } #pragma GCC ivdep for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<3; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<9; i2+=1) { { { auto tmp0 = out_ptr0[i1 + (3i2) + (27i0)]; out_ptr1[i1 + (3i2) + (27i0)] = tmp0; out_ptr2[i1 + (3i2) + (27i0)] = tmp0; } } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args args.clear() buf0 = empty_strided((128, 3, 3, 3), (27, 1, 9, 3), device='cpu', dtype=torch.float32) buf2 = empty_strided((128, 3, 3, 3), (27, 1, 9, 3), device='cpu', dtype=torch.float32) buf4 = empty_strided((128, 3, 3, 3), (27, 1, 9, 3), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg4_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr())) del arg4_1 del buf0 buf3 = torch.ops.mkldnn._convolution_pointwise(buf2, arg0_1, arg1_1, (0, 0), (1, 1), (1, 1), 1, 'none', [], '') assert_size_stride(buf3, (128, 3, 3, 3), (27, 1, 9, 3)) del arg0_1 del arg1_1 del buf2 buf5 = torch.ops.mkldnn._convolution_pointwise(buf4, arg2_1, arg3_1, (0, 0), (1, 1), (1, 1), 1, 'none', [], '') assert_size_stride(buf5, (128, 3, 3, 3), (27, 1, 9, 3)) del arg2_1 del arg3_1 return (buf3, buf5, ) ``` After this PR, the generated code will remove the redundant memory copy: ``` kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0) { #pragma GCC ivdep for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<3; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3; i2+=1) { #pragma GCC ivdep for(long i3=0; i3<3; i3+=1) { { { auto tmp0 = in_ptr0[i3 + (6i2) + (42i1) + (147i0)]; auto tmp1 = in_ptr0[3 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp3 = in_ptr0[6 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp5 = in_ptr0[21 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp7 = in_ptr0[24 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp9 = in_ptr0[27 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp11 = in_ptr0[42 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp13 = in_ptr0[45 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp15 = in_ptr0[48 + i3 + (6i2) + (42i1) + (147i0)]; auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); out_ptr0[i3 + (3i2) + (9i1) + (27i0)] = tmp16; } } } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args args.clear() buf0 = empty_strided((128, 3, 3, 3), (27, 1, 9, 3), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg4_1.data_ptr()), c_void_p(buf0.data_ptr())) del arg4_1 buf2 = torch.ops.mkldnn._convolution_pointwise(buf0, arg0_1, arg1_1, (0, 0), (1, 1), (1, 1), 1, 'none', [], '') assert_size_stride(buf2, (128, 3, 3, 3), (27, 1, 9, 3)) del arg0_1 del arg1_1 buf3 = torch.ops.mkldnn._convolution_pointwise(buf0, arg2_1, arg3_1, (0, 0), (1, 1), (1, 1), 1, 'none', [], '') assert_size_stride(buf3, (128, 3, 3, 3), (27, 1, 9, 3)) del arg2_1 del arg3_1 return (buf2, buf3, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89904 Approved by: https://github.com/jansel	2022-12-06 03:07:53 +00:00
Bin Bao	29233a18c7	[inductor] Add test_ops_gradients running with inductor (#89792 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89792 Approved by: https://github.com/janeyx99, https://github.com/clee2000, https://github.com/huydhn	2022-12-06 02:26:29 +00:00
William Wen	ebeecbf833	Dynamo FX graph stack traceback fix (#87136 ) Migration from https://github.com/pytorch/torchdynamo/pull/1655. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87136 Approved by: https://github.com/voznesenskym	2022-12-06 02:22:16 +00:00
Nikita Shulga	a268b9e53c	Fix yet another C++17 Windows build issue (#90228 ) Not sure why, but top-level `using namespace` directive causes VC++ fail with (if C++17 standard is used, but everything is fine with C++14): ``` C:\actions-runner\_work\pytorch\pytorch\third_party\pybind11\include\pybind11\detail\../pytypes.h(1520): error C2872: 'attr': ambiguous symbol C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/core/interned_strings.h(349): note: could be 'c10::attr' C:\actions-runner\_work\pytorch\pytorch\torch/csrc/jit/ir/ir.h(75): note: or 'torch::jit::attr' C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\pybind11\include\pybind11/pybind11.h(1094): note: see reference to function template instantiation 'pybind11::str pybind11::str::format<_Ty1&>(_Ty1 &) const' being compiled with [ _Ty1=pybind11::handle ] ``` Solve this by replacing global `using namespace torch::jit;` with specific usages of objects/methods from namespaces Another prep change for https://github.com/pytorch/pytorch/70188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90228 Approved by: https://github.com/kit1980, https://github.com/albanD	2022-12-06 01:35:19 +00:00
Kimish Patel	55b10e6b1d	[Pytorch][Vulkan] Use specalized shader for 3x3 depthwise conv (#89953 ) This diff uses specialized implementation for 3x3 and 5x5 dw conv. Differential Revision: [D41006638](https://our.internmc.facebook.com/intern/diff/D41006638/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89953 Approved by: https://github.com/salilsdesai, https://github.com/kirklandsign	2022-12-06 00:56:57 +00:00

... 3 4 5 6 7 ...

54743 Commits