pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Peter Bell	ad76a4e1e7	[inductor] Allow sympy expressions to participate in type promotion (#115676 ) In the test example we have `add(i64[10], sympy.Expr)` where `sympy.Expr` is not considered a promoting arg so isn't factored into the type promotion. However, in eager it would promote to float32. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115676 Approved by: https://github.com/lezcano ghstack dependencies: #115677, #115699, #115700	2023-12-13 22:22:37 +00:00
Joel Schlosser	22704426c3	Expand dynamic dims support for traceable subclasses (#114311 ) Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors of the same dim as outer when `mark_dynamic(outer, ...)` is called * Addresses this: `6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)` * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311 Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-12-05 21:09:25 +00:00
Edward Z. Yang	473b17c4c1	Run sympy expressions with Python values / FX tracing (#113978 ) To codegen deferred runtime asserts, I need to be able to convert sympy expressions back into regular Python expressions that I can put in FX graphs. This PR adds some of the machinery to do this: it adds a new sympy analysis that runs operations on all FX traceable operations that can also be run with plain Python int/float/bool/etc. It's tested by symbolic tracing through the analysis, and then testing that this traced graph gives the same result as running the Python analysis directly. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113978 Approved by: https://github.com/aakhundov, https://github.com/lezcano	2023-11-20 21:25:11 +00:00
Brian Vaughan	dbb96ef30d	improve annotation device parameters where a device ordinal is allowed (#113647 ) Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal. `error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str \| device" [arg-type]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647 Approved by: https://github.com/albanD	2023-11-17 14:41:22 +00:00
Edward Z. Yang	dfa9e7b511	Allow inferring divisibility on unbacked SymInts and do replacement trick (#113165 ) We want something like torch.empty(i0, 12).view(4, -1, 12) to work. Right now, it chokes on guards on data dependent accesses. It turns out we are very close to having it work based on experiments in https://github.com/pytorch/pytorch/issues/112347 if we do the replacement trick, setting i0 = i1 * 4 to explicitly encode in the divisibility; this is good enough for Sympy to be able to handle the rest. There are two parts to this PR. * First, we must discover that there is this divisibility constraint. The place where this happens on view is in `infer_size`; however, we are unable to discover the modulus test with `expect_true` because the condition is currently written with a Python boolean operator that forces guarding too early: `numel == newsize or (dim is not None and newsize > 0 and numel % newsize == 0)`. We rewrite this into an equivalent version which tests on dim being None or not first, before performing individual tests. The main nontrivial reasoning here is that I must show that my set of tests in the `dim is None` branch are sufficient when `numel == newsize`. However, if `numel == newsize`, then the modulus must pass. Thus this is equivalent. * Given the modifications to `infer_size`, this suffices to produce a runtime assert `Eq(Mod(192*i0, 2304), 0)`. Now we must simply turn this into the replacement automatically. I wasn't really sure how to use Sympy to do this for me, so I just manually pattern matched for this particular expression form, and if it exists do the replacements. Note that this is kind of only useful for export, because inductor chokes on views involving unbacked SymInts. That will be follow up. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113165 Approved by: https://github.com/lezcano, https://github.com/aakhundov	2023-11-10 21:28:02 +00:00
Maxime Arthaud	62cbe86ac0	[torch] Skip the assertion on the return type when the annotation is a forward reference (#112870 ) Summary: The assertion is causing build failures when running Pysa, our security-focused static analyzer. This is because we run `pyre infer` on the source code before analyzing it, which introduces annotations such as `def foo() -> 'torch._tensor.Tensor'`. This does not work with the `out_wrapper` decorator which relies on inspecting the signature of the decorated function. Let's skip the check on the return type if we detect that it was introduced by `pyre infer`. Test Plan: eyes Differential Revision: D50976601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112870 Approved by: https://github.com/ZainRizvi	2023-11-04 00:22:13 +00:00
Jon Chuang	53acdb66f7	[primtorch] `aten.normal` decomp has wrong return type due to `elementwise_type_promotion_wrapper` (#112467 ) Fixes https://github.com/pytorch/pytorch/issues/112449 elementwise_type_promotion_wrapper will promote `aten.normal` to the dtypes of `mean`, `std` args. But this is incorrect if we provide the dtype param. Hence, we allow overriding the result_dtype if a specified dtype arg is available. This problem is unique to `aten.normal`, all other ops decorated do not have a dtype param. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112467 Approved by: https://github.com/lezcano	2023-10-31 20:57:09 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
Jon Chuang	ac768333be	[dynamo] fix prim lowering validation logic for dynamic shape args (#111208 ) Fixes https://github.com/pytorch/pytorch/issues/111199 Fixes https://github.com/pytorch/pytorch/issues/111203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111208 Approved by: https://github.com/ezyang	2023-10-13 18:36:13 +00:00
Joel Schlosser	8f90be4478	Expand NT subclass to support SAM (#109123 ) This PR contains the changes needed to support using the NT jagged subclass within SAM. Note that a NT with multiple ragged dims is still required at the extremes for inputs / outputs, but the internal computation generally involves a single ragged dim, making the jagged layout usable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109123 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-10-12 20:33:22 +00:00
chilli	6b1007b2a7	Fix error in div lowering with integers (#102809 ) Fixes https://github.com/pytorch/pytorch/issues/101016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102809 Approved by: https://github.com/ngimel ghstack dependencies: #110501, #110504, #110591, #110668, #110687	2023-10-06 23:21:40 +00:00
Edward Z. Yang	3262c5358f	Use _check_is_size for validate_dim_length (#109849 ) _check_is_size has some extra juice for unbacked SymInts, use it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109849 Approved by: https://github.com/yanboliang	2023-09-26 23:33:31 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
Ying Zhang	bbdce93571	Basic fp8 support in Inductor (#109168 ) Add basic fp8 support in Inductor, including: * Fix fp8 Triton codegen issues; * Add min_elements_per_thread requirement for fp8 related dtype conversions. More details on Triton implementation can be found from `10f59d8ce0/lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp (L10)`. Note that the current implementation only works for Pointwise. Will create follow-up PRs for Reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109168 Approved by: https://github.com/drisspg	2023-09-23 04:41:41 +00:00
Mwiza Kunda	6b7b9c796e	Fix registering jit decompositions for jvp for out wrapped decomps (#109367 ) Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since: - `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter - `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367 Approved by: https://github.com/lezcano	2023-09-21 16:36:51 +00:00
Edward Z. Yang	55f956f1d2	optests improvements based on torchvision usage on nms (#108929 ) - Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes https://github.com/pytorch/pytorch/issues/108927 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108929 Approved by: https://github.com/zou3519	2023-09-13 13:26:15 +00:00
Guilherme Leobas	7e878c9d10	Add decomposition for `aten.take_along_dim` (#108185 ) xref #107875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108185 Approved by: https://github.com/lezcano	2023-09-04 13:49:53 +00:00
Ivan Yashchuk	c913f3857f	Remove dynamo+nvfuser (#105789 ) This PR removes unmaintained Dynamo+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789 Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD	2023-08-08 22:29:32 +00:00
PyTorch MergeBot	891bb259f8	Revert "Remove dynamo+nvfuser (#105789 )" This reverts commit `6030151d37`. Reverted https://github.com/pytorch/pytorch/pull/105789 on behalf of https://github.com/DanilBaibak due to Break a lot of tests on main. ([comment](https://github.com/pytorch/pytorch/pull/105789#issuecomment-1669710571))	2023-08-08 14:20:32 +00:00
Ivan Yashchuk	6030151d37	Remove dynamo+nvfuser (#105789 ) This PR removes unmaintained Dynamo+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789 Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD	2023-08-08 13:29:31 +00:00
dilililiwhy	5a9e82fa02	let torch.device be overrideable by TorchFunctionMode (#106514 ) Fixes #103828 let torch.device be overrideable by TorchFunctionMode Pull Request resolved: https://github.com/pytorch/pytorch/pull/106514 Approved by: https://github.com/ezyang	2023-08-04 10:47:43 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	8a688277a2	[BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432 Approved by: https://github.com/ezyang	2023-07-19 13:48:44 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	3c5a494d7a	Revert "Update mypy to 1.4.1 (#91983 )" This reverts commit `634659e262`. Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))	2023-07-14 15:59:16 +00:00
Nikita Shulga	634659e262	Update mypy to 1.4.1 (#91983 ) Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi	2023-07-13 16:30:36 +00:00
Nikita Karetnikov	c4a6f86062	[pt2] add metas for `max_unpool2d` and `max_unpool3d` (#103821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103821 Approved by: https://github.com/Skylion007, https://github.com/Chillee	2023-07-01 01:33:35 +00:00
Peter Bell	8b418f197c	[decomp] Add decomposition for torch.renorm (#103858 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103858 Approved by: https://github.com/ezyang, https://github.com/nkaretnikov	2023-06-21 20:57:43 +00:00
Kurt Mohler	ee83c646bb	Replace `_prims_common.check` with `torch._check` (#103240 ) This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240 Approved by: https://github.com/albanD	2023-06-21 00:46:17 +00:00
Ivan Zaitsev	821493715c	Back out "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )", Back out "Forwatd fix for D46427687" (#103128 ) Test Plan: revertitparrot Reviewed By: malfet Differential Revision: D46506433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128 Approved by: https://github.com/malfet	2023-06-07 01:41:41 +00:00
Kurt Mohler	a84bb2709a	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-03 02:23:21 +00:00
PyTorch MergeBot	a7efa0ce35	Revert "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )" This reverts commit `fb79d43649`. Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))	2023-06-02 20:00:48 +00:00
Kurt Mohler	fb79d43649	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-02 19:13:45 +00:00
Xia, Weiwen	ce9923a1cb	[Quant][PT2E][Inductor] Lower quantized conv to Inductor (#101164 ) Summary Enable the lowering path for reference quantized conv after PT2E to Inductor. The pattern `decomposed dequantize -> aten.convolution -> decomposed quantize` is fused to `quantized.functional.conv1d/2d/3d` and Inductor makes external calls to these ops. This PR focuses on functionality only. The implementation is expected to have low performance. Code example: ```Python class M(torch.nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 6, 2, stride=2, padding=0, dilation=1) def forward(self, x): return nn.functional.gelu(self.conv(x)) m = M().eval() example_inputs = (torch.randn(2, 3, 6, 6),) exported_model, guards = torchdynamo.export( m, copy.deepcopy(example_inputs), aten_graph=True, tracing_mode="real", ) qconfig = get_default_qconfig("x86") qconfig_mapping = QConfigMapping().set_global(qconfig) backend_config_inductor = get_x86_inductor_pt2e_backend_config() prepared_model = prepare_pt2e( exported_model, qconfig_mapping, example_inputs, backend_config_inductor ) prepared_model(example_inputs) converted_model = convert_pt2e(prepared_model) run = compile_fx(converted_model, example_inputs) ``` Output code by Inductor ```python from ctypes import c_void_p, c_long import torch import math import random import os import tempfile from torch._inductor.hooks import run_intermediate_hooks from torch._inductor.utils import maybe_profile from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile from torch._inductor.select_algorithm import extern_kernels aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_weiwen/5d/c5dsrjrcd4jlzryilhxl5hdvcrzsoek52xzzqqy57hcoezvxxxwm.h" extern "C" void kernel(const float* in_ptr0, const float* in_ptr1, const long* in_ptr2, unsigned char* out_ptr0) { { #pragma GCC ivdep for(long i0=static_cast<long>(0L); i0<static_cast<long>(2L); i0+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i1=static_cast<long>(0L); i1<static_cast<long>(3L); i1+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i2=static_cast<long>(0L); i2<static_cast<long>(36L); i2+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(i2 + (36Li1) + (108Li0))]; auto tmp1 = in_ptr1[static_cast<long>(0L)]; auto tmp7 = in_ptr2[static_cast<long>(0L)]; auto tmp2 = 1 / tmp1; auto tmp3 = static_cast<float>(1.0); auto tmp4 = decltype(tmp2)(tmp2 * tmp3); auto tmp5 = decltype(tmp0)(tmp0 * tmp4); auto tmp6 = std::nearbyint(tmp5); auto tmp8 = static_cast<float>(tmp7); auto tmp9 = tmp6 + tmp8; auto tmp10 = static_cast<float>(0); auto tmp11 = max_propagate_nan(tmp9, tmp10); auto tmp12 = static_cast<float>(127); auto tmp13 = min_propagate_nan(tmp11, tmp12); auto tmp14 = static_cast<unsigned char>(tmp13); out_ptr0[static_cast<long>(i1 + (3Li2) + (108Li0))] = tmp14; } } } } } ''') kernel_cpp_1 = async_compile.cpp(''' #include "/tmp/torchinductor_weiwen/5d/c5dsrjrcd4jlzryilhxl5hdvcrzsoek52xzzqqy57hcoezvxxxwm.h" extern "C" void kernel(const unsigned char* in_ptr0, const long* in_ptr1, const float* in_ptr2, float* out_ptr0) { { #pragma GCC ivdep for(long i0=static_cast<long>(0L); i0<static_cast<long>(2L); i0+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i1=static_cast<long>(0L); i1<static_cast<long>(6L); i1+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i2=static_cast<long>(0L); i2<static_cast<long>(9L); i2+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(i1 + (6Li2) + (54Li0))]; auto tmp2 = in_ptr1[static_cast<long>(0L)]; auto tmp5 = in_ptr2[static_cast<long>(0L)]; auto tmp1 = static_cast<float>(tmp0); auto tmp3 = static_cast<float>(tmp2); auto tmp4 = tmp1 - tmp3; auto tmp6 = decltype(tmp4)(tmp4 * tmp5); auto tmp7 = static_cast<float>(0.5); auto tmp8 = decltype(tmp6)(tmp6 * tmp7); auto tmp9 = static_cast<float>(0.7071067811865476); auto tmp10 = decltype(tmp6)(tmp6 * tmp9); auto tmp11 = std::erf(tmp10); auto tmp12 = static_cast<float>(1); auto tmp13 = tmp11 + tmp12; auto tmp14 = decltype(tmp8)(tmp8 * tmp13); out_ptr0[static_cast<long>(i2 + (9Li1) + (54Li0))] = tmp14; } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args args.clear() buf0 = torch.ops.quantized_decomposed.quantize_per_channel.default(arg0_1, arg4_1, arg5_1, 0, -128, 127, torch.int8) del arg0_1 buf1 = buf0 assert_size_stride(buf1, (6, 3, 2, 2), (12, 4, 2, 1)) del buf0 buf2 = empty_strided((2, 3, 6, 6), (108, 1, 18, 3), device='cpu', dtype=torch.uint8) kernel_cpp_0(c_void_p(arg8_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(buf2.data_ptr())) del arg8_1 buf2 = torch._make_per_tensor_quantized_tensor(buf2, arg2_1, arg3_1) buf1 = torch._make_per_channel_quantized_tensor(buf1, arg4_1, arg5_1, 0) buf3 = torch.ao.nn.quantized.functional.conv2d(buf2, buf1, arg1_1, (2, 2), (0, 0), (1, 1), 1, 'zeros', arg6_1, arg7_1, torch.uint8) assert_size_stride(buf3, (2, 6, 3, 3), (54, 1, 18, 6)) del arg1_1 del arg2_1 del arg3_1 del arg4_1 del arg5_1 del buf1 del buf2 buf4 = empty_strided((2, 6, 3, 3), (54, 9, 3, 1), device='cpu', dtype=torch.float32) kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(arg7_1.data_ptr()), c_void_p(arg6_1.data_ptr()), c_void_p(buf4.data_ptr())) del arg6_1 del arg7_1 return (buf4, ) def benchmark_compiled_module(times=10, repeat=10): from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((6, 3, 2, 2), (12, 4, 2, 1), device='cpu', dtype=torch.float32) arg1_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32) arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32) arg3_1 = rand_strided((), (), device='cpu', dtype=torch.int64) arg4_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32) arg5_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.int64) arg6_1 = rand_strided((), (), device='cpu', dtype=torch.float32) arg7_1 = rand_strided((), (), device='cpu', dtype=torch.int64) arg8_1 = rand_strided((2, 3, 6, 6), (108, 36, 6, 1), device='cpu', dtype=torch.float32) return print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1]), times=times, repeat=repeat) if __name__ == "__main__": from torch._inductor.utils import compiled_module_main compiled_module_main('None', benchmark_compiled_module) ``` Test plan python test/test_quantization.py TestQuantizePT2EFXX86Inductor.test_inductor_qconv_lowering Pull Request resolved: https://github.com/pytorch/pytorch/pull/101164 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-06-01 10:22:02 +00:00
Richard Zou	fc31b3a106	Allow existing "Python RAII guards" to be used as context managers (#102579 ) This PR adds a `py_context_manager_DEPRECATED` that converts a C++ RAII guard to an object that may be either used as Python context manager or as a "Python RAII guard". We don't convert all of them to Python context manager only due to BC reasons; people in OSS and internally actually rely on these APIs and I don't want to break them. We are justified in breaking BC if we wanted to, but it seemed like too much work for not a lot of gain. The API is postfixed with "DEPRECATED" to indicate that people should really use `py_context_manager` (converts C++ RAII guard to Python context manager) instead. Test Plan: - this PR converts all PyTorch usages of _AutoDispatchBelowAutograd to context manager. I can do the rest in follow-ups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102579 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-05-31 19:55:38 +00:00
Nikita Karetnikov	1e591a8b64	[pt2] add meta function for `solve_triangular` (#100829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829 Approved by: https://github.com/ezyang	2023-05-08 13:48:15 +00:00
Edward Z. Yang	54c0edf6da	Track exact origin_node on best effort basis (#100110 ) Currently, we track 'origins' on IR nodes so that we have some idea about what FX IR nodes contributed to any given fused kernel. However, the origins are dumped into an undifferentiated set, so if you have, e.g., multiple outputs, you cannot easily tell which output corresponds to which FX node. This PR introduce a more precise notion of tracking "origin_node" which says that the contents of this Buffer/Loop node corresponds EXACTLY to the output of a particular FX node; e.g., if you serialized each intermediate when running the generated inductor code, you could compare them with the corresponding intermediates from the original FX graph. Tracking origin_node in all cases requires quite a bit of effort, so this PR introduces the tracking on a strictly best effort basis. The logic in torch/_inductor/graph.py sets up the associations, but only when it is "obvious" which IR node should get the assignment, and there is work in torch/_inductor/ir.py for propagating this information around as necessary. Like origins, origin_node is not a true dataclass field (as this would break all existing positional arg call sites), instead, it is added post facto via `__post_init__`. At the moment, it is only valid for Buffer/Loop to have an origin_node, but we could imagine relaxing this in the future. The payoff is in torch/_inductor/codegen/wrapper.py and torch/_inductor/codegen/triton.py where we currently just print the FX node name and the tensor (but a more useful integration will be coming later.) I also introduce a debugging tool `debug_ir_traceback` which tracks tracebacks of where IRNodes were allocated, to help you understand why a node doesn't have an `origin_node`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100110 Approved by: https://github.com/voznesenskym	2023-04-28 04:15:27 +00:00
Animesh Jain	6bc4651193	[philox_rand] Dynamic shape support (#99290 ) Extends the functionalization of rng work to Dynamic shapes. An example of the generated graph looks like this ~~~ [2023-04-24 21:41:37,446] torch._functorch.aot_autograd.__aot_graphs: [INFO] TRACED GRAPH ===== Forward graph 1 ===== <eval_with_key>.7 class <lambda>(torch.nn.Module): def forward(self, arg0_1: i64[], arg1_1: i64[], arg2_1: Sym(s0), arg3_1: Sym(s1), arg4_1: f32[s0, s1]): # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:46, code: a = torch.rand_like(x) * x add: i64[] = torch.ops.aten.add.Tensor(arg1_1, 0) philox_rand = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add, None, device(type='cuda', index=0), torch.float32); add = None getitem: f32[s0, s1] = philox_rand[0] getitem_1: i64[] = philox_rand[1]; philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(getitem_1, 0); getitem_1 = None mul: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem, arg4_1); getitem = arg4_1 = None # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:47, code: a = torch.rand_like(x) * a add_2: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_1) philox_rand_1 = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add_2, None, device(type='cuda', index=0), torch.float32); arg2_1 = arg3_1 = arg0_1 = add_2 = None getitem_2: f32[s0, s1] = philox_rand_1[0] getitem_3: i64[] = philox_rand_1[1]; philox_rand_1 = None add_3: i64[] = torch.ops.aten.add.Tensor(add_1, getitem_3); add_1 = getitem_3 = None mul_1: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem_2, mul); getitem_2 = mul = None # No stacktrace found for following nodes add_4: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_3); arg1_1 = add_3 = None return (mul_1, add_4) ~~~ Each rand op is accompanied by its offset calculation op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99290 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-25 22:40:28 +00:00
Elias Ellison	638feec4e3	Turn on meta converter for complex (#98869 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98869 Approved by: https://github.com/ngimel	2023-04-20 16:42:38 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Nikita Karetnikov	9585a7ffd3	[inductor] support non-tensor ops with dynamic shapes (#97519 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97519 Approved by: https://github.com/jansel	2023-03-26 00:38:50 +00:00
Edward Z. Yang	4833e47feb	Add support for nonzero, some improvements to reduce guards (#95387 ) This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit# It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1. What's in the PR: * nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question. * The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise. * PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`) * Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`) * I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful. * Added `constrain_unify` to let you specify two unbacked SymInts must have the same value Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387 Approved by: https://github.com/voznesenskym	2023-02-24 00:27:45 +00:00
Peter Bell	bc438af6fe	std/var: support floating point correction value (#94073 ) Ref https://github.com/pytorch/pytorch/issues/61492#issuecomment-1413003480 The array API specifies correction to be `Union[int, float]` while we currently only support integers. https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html As std/var is calculated currently, the final count of elements is already done in floating point so we can make the correction floating point without any loss of precision or generality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94073 Approved by: https://github.com/ezyang	2023-02-23 05:50:45 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
Edward Z. Yang	9d5fcd37a2	sym_max/sym_min introduce guard if hinted (#94400 ) This patch started with only the change in `torch/_prims_common/__init__.py`. Unfortunately, this change by itself fails tests. The reason it fails tests is sym_max produces sympy.Max expression, which impedes our ability to actually reason symbolically about the resulting expressions. We much prefer to insert a guard on `l > 1` and get a Sympy expression without Max in it, if we can. In the upcoming unbacked SymInts PR, we can't necessarily do this, but without unbacked SymInts, we always can. To do this, we introduce `alternate_impl_if_hinted_methods`. The idea is that if all of the arguments into max/min have hints, we will just go ahead and introduce a guard and then return one argument or the other, depending on the result. This is done by rewrapping the SymNode into SymInt/SymFloat and then running builtins.min/max, but we also could have just manually done the guarding (see also https://github.com/pytorch/pytorch/pull/94365 ) However, a very subtle problem emerges when you do this. When we do builtins min/max, we return the argument SymNode directly, without actually allocating a fresh SymNode. Suppose we do a min-max with a constant (as is the case in `sym_max(l, 1)`. This means that we can return a constant SymNode as the result of the computation. Constant SymNodes get transformed into regular integers, which then subsequently trigger the assert at https://github.com/pytorch/pytorch/pull/94400/files#diff-03557db7303b8540f095b4f0d9cd2280e1f42f534f67d8695f756ec6c02d3ec7L620 After thinking about this a bit, I think the assert is wrong. It should be OK for SymNode methods to return constants. The reason the assert was originally added was that ProxyTensorMode cannot trace a constant return. But this is fine: if you return a constant, no tracing is necessary; you know you have enough guards that it is guaranteed to be a constant no matter what the input arguments are, so you can burn it in. You might also be wondering why a change to SymNode method affects the assert from the dispatch mode dispatch: the call stack typically looks like SymNode.binary_magic_impl -> SymProxyTensorMode -> SymNode.binary_magic_impl again; so you hit the binary_magic_impl twice! No new tests, the use of sym_max breaks preexisting tests and then the rest of the PR makes the tests pass again. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94400 Approved by: https://github.com/Chillee	2023-02-13 23:36:21 +00:00

1 2 3

114 Commits