pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
albanD	067d203b22	Upgrade pybind11 API calls for 3.13t (#136370 ) This is a modified version of https://github.com/pytorch/pytorch/pull/130341 that preserve support for older pybind version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136370 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-09-20 23:09:55 +00:00
albanD	cf31724db7	Fix and improvements to toward 3.13t (#136319 ) Small part of https://github.com/pytorch/pytorch/pull/130689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136319 Approved by: https://github.com/malfet, https://github.com/Skylion007	2024-09-20 04:22:18 +00:00
cyy	31e42a45dd	Fix redundant move warnings by g++ (#134987 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134987 Approved by: https://github.com/ezyang	2024-09-15 05:28:19 +00:00
Jennifer (Jiyue) Wang	a32255481b	[caffe2][hipify] remove un-used flag from `pybind_utils.h` (#134404 ) Summary: Encountered issues related to AMD build when working on https://www.internalfb.com/diff/D60739324?dst_version_fbid=2203158110057105 (see stack trace P1545717562) Looking at the file history, seems that the flag is no longer used so I propose to remove it. Alternatively, I could change the `#ifdef` to check both `USE_C10D_NCCL` and `USE_ROCM` and include the corresponding AMD header files. Let me know what is more preferred way. Test Plan: Sandcastle Differential Revision: D61762129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134404 Approved by: https://github.com/malfet	2024-08-29 04:09:44 +00:00
Zhengxu Chen	59b3f5911d	[sigmoid] Support custom obj deserialization. (#133463 ) Summary: It seems we have multiple places deserializing torchbind objects. Moving the code around so that every load essentially share the same implementation. Also added a test case "package_reader_testing" which load back the archive file in Python and eagerly validate the numerical result. Test Plan: buck test mode/opt sigmoid/inference/test:e2e_test_cpu Reviewed By: SherlockNoMad Differential Revision: D61235770 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133463 Approved by: https://github.com/ydwu4	2024-08-15 17:58:44 +00:00
cyy	8967d55b01	[18/N] Fix clang-tidy warnings in jit (#132963 ) Follows #132753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132963 Approved by: https://github.com/Skylion007	2024-08-09 01:27:32 +00:00
cyy	5b3b2b9cc7	[7/N] Fix clang-tidy warnings in jit (#131996 ) Follows #131986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131996 Approved by: https://github.com/ezyang	2024-07-29 01:21:18 +00:00
cyy	ddd539ba6c	[6/N] Fix clang-tidy warnings in jit (#131986 ) Follows #131969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131986 Approved by: https://github.com/ezyang	2024-07-29 00:49:08 +00:00
cyy	99e13e68e9	[4/N] Fix clang-tidy warnings in jit (#131903 ) Follows #131830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131903 Approved by: https://github.com/Skylion007	2024-07-27 08:08:14 +00:00
PyTorch MergeBot	161bb67116	Revert "Fix static `py::object` dangling pointer with `py::gil_safe_call_once_and_store` (#130341 )" This reverts commit `ace6decc99`. Reverted https://github.com/pytorch/pytorch/pull/130341 on behalf of https://github.com/clee2000 due to unfortunately the internal pybind update got reverted cc @malfet ([comment](https://github.com/pytorch/pytorch/pull/130341#issuecomment-2253147079))	2024-07-26 17:02:56 +00:00
cyy	2988d33c80	[3/N] Fix clang-tidy warnings in jit (#131830 ) Follows #131735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131830 Approved by: https://github.com/ezyang	2024-07-26 15:46:28 +00:00
Brian Hirsh	5612408735	_get_operation_overload: dont raise exception when overload does not exist (#131554 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131554 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #131403, #131482, #131665	2024-07-26 15:38:11 +00:00
Xuehai Pan	ace6decc99	Fix static `py::object` dangling pointer with `py::gil_safe_call_once_and_store` (#130341 ) Fix static `py::object`s with `py::gil_safe_call_once_and_store`. The following code will leak a `py::object` which will call its destructor when shutdown the program. The destructor will call `Py_DECREF(obj.m_ptr)` which may raise a segmentation fault. ```c++ void func() { static py::object obj = py::module_::import("foo").attr("bar"); ... } ``` The correct code is to use raw pointers rather than the instance. ```c++ void func() { static py::object* obj_ptr = new py::object{py::module_::import("foo").attr("bar")}; py::object obj = *obj_ptr; ... } ``` This PR uses the `py::gil_safe_call_once_and_store` function from `pybind11`, which can run arbitrary initialization code only once under the Python GIL thread safely. ```c++ void func() { PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<py::object> storage; py::object obj = storage .call_once_and_store_result( []() -> py::object { return py::module_::import("foo").attr("bar"); } ) .get_stored(); ... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130341 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-07-25 05:53:09 +00:00
PyTorch MergeBot	ea78b0c177	Revert "Fix static `py::object` dangling pointer with `py::gil_safe_call_once_and_store` (#130341 )" This reverts commit `a17d1e5322`. Reverted https://github.com/pytorch/pytorch/pull/130341 on behalf of https://github.com/izaitsevfb due to internal needs pybind update ([comment](https://github.com/pytorch/pytorch/pull/130341#issuecomment-2226499397))	2024-07-12 23:07:37 +00:00
Bertrand Thia	43b98fa521	Add debug repr to SymNode (#129925 ) Fixes #129403 Create a separate printing function to debug SymNode, since we can't easily change `__repr__` that is used by GraphModule.recompile() to create a pythonic version of a graph This is my first contribution, please let me know if there is anything that I should look into in further details Thank you for you guidance! 🙏 I hope to contribute more in the future! @aorenste Pull Request resolved: https://github.com/pytorch/pytorch/pull/129925 Approved by: https://github.com/aorenste	2024-07-12 18:31:23 +00:00
Xuehai Pan	a17d1e5322	Fix static `py::object` dangling pointer with `py::gil_safe_call_once_and_store` (#130341 ) Fix static `py::object`s with `py::gil_safe_call_once_and_store`. The following code will leak a `py::object` which will call its destructor when shutdown the program. The destructor will call `Py_DECREF(obj.m_ptr)` which may raise a segmentation fault. ```c++ void func() { static py::object obj = py::module_::import("foo").attr("bar"); ... } ``` The correct code is to use raw pointers rather than the instance. ```c++ void func() { static py::object* obj_ptr = new py::object{py::module_::import("foo").attr("bar")}; py::object obj = *obj_ptr; ... } ``` This PR uses the `py::gil_safe_call_once_and_store` function from `pybind11`, which can run arbitrary initialization code only once under the Python GIL thread safely. ```c++ void func() { PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<py::object> storage; py::object obj = storage .call_once_and_store_result( []() -> py::object { return py::module_::import("foo").attr("bar"); } ) .get_stored(); ... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130341 Approved by: https://github.com/ezyang	2024-07-10 04:23:37 +00:00
cyy	29861779ce	[2/N] Change #include <c10/util/Optional.h> to #include <optional> (#130236 ) Follows #128301. The changes were made by grep and sed Pull Request resolved: https://github.com/pytorch/pytorch/pull/130236 Approved by: https://github.com/ezyang	2024-07-09 03:17:24 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
Shiyan Deng	1e27af335e	[easy] enhance local model loading (#129897 ) Summary: 1. add one more model lib dep. 2. add error message when torchscript failed to find a class in python compilation unit. Test Plan: CI Reviewed By: jingsh Differential Revision: D59243250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129897 Approved by: https://github.com/jingsh	2024-07-03 00:29:02 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
angelayi	e9c6e8369c	Torchbind call method + effects support (#128397 ) Adds effect token support to torchbind method calls by allowing `with_effects` to take in `torch.ops._higher_order_ops.call_torchbind` as an input. Here is the print from `TORCH_LOGS="aot" python test/export/test_torchbind.py -k test_compile_obj_torchbind_op`: ```python def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2]", arg2_1): # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1266 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos()) cos: "f32[2]" = torch.ops.aten.cos.default(arg1_1) with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, cos); arg0_1 = cos = None getitem: "f32[0]" = with_effects[0]; with_effects = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1267 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos() + 1) cos_1: "f32[2]" = torch.ops.aten.cos.default(arg1_1) add: "f32[2]" = torch.ops.aten.add.Tensor(cos_1, 1); cos_1 = None with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, add); getitem = add = None getitem_2: "f32[0]" = with_effects_1[0]; with_effects_1 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1268 in f, code: torch.ops._TorchScriptTesting.queue_pop(tq) with_effects_2 = torch._higher_order_ops.effects.with_effects(getitem_2, torch.ops._TorchScriptTesting.queue_pop.default, arg2_1); getitem_2 = None getitem_4: "f32[0]" = with_effects_2[0]; with_effects_2 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1269 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.sin()) sin: "f32[2]" = torch.ops.aten.sin.default(arg1_1); arg1_1 = None with_effects_3 = torch._higher_order_ops.effects.with_effects(getitem_4, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, sin); getitem_4 = sin = None getitem_6: "f32[0]" = with_effects_3[0]; with_effects_3 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1270 in f, code: return tq.pop(), tq.pop() + tq.size(), tq with_effects_4 = torch._higher_order_ops.effects.with_effects(getitem_6, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_6 = None getitem_8: "f32[0]" = with_effects_4[0] getitem_9: "f32[2]" = with_effects_4[1]; with_effects_4 = None with_effects_5 = torch._higher_order_ops.effects.with_effects(getitem_8, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_8 = None getitem_10: "f32[0]" = with_effects_5[0] getitem_11: "f32[2]" = with_effects_5[1]; with_effects_5 = None with_effects_6 = torch._higher_order_ops.effects.with_effects(getitem_10, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'size'); getitem_10 = arg2_1 = None getitem_12: "f32[0]" = with_effects_6[0]; with_effects_6 = None add_1: "f32[2]" = torch.ops.aten.add.Tensor(getitem_11, 0); getitem_11 = None return (getitem_12, getitem_9, add_1) ``` In order to support this, this PR makes the following changes: * Adds `FakeScriptObject` to `CustomObjArgument`, which will be put on the `meta["val"]` of nodes representing torchbind objects. * Adds pickle/deepcopy support to FunctionSchema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128397 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-06-14 21:28:17 +00:00
cyy	99f5a85a09	[Clang Tidy] Fix misc-header-include-cycle errors in clang-tidy and ignore some files (#127233 ) Since there are such cycles in libfmt and PyTorch, which are detected by clang-tidy. ``` /home/cyy/pytorch/third_party/fmt/include/fmt/format-inl.h:25:10: error: circular header file dependency detected while including 'format.h', please check the include path [misc-header-include-cycle,-warnings-as-errors] 25 \| #include "format.h" \| ^ /home/cyy/pytorch/third_party/fmt/include/fmt/format.h:4530:12: note: 'format-inl.h' included from here 4530 \| # include "format-inl.h" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127233 Approved by: https://github.com/ezyang	2024-06-10 23:49:58 +00:00
Peter Bell	d3817d8a60	Don't create python tuple when _maybe_handle_torch_function is called from C++ (#128187 ) Marginal overhead reduction when calling through the `torch.ops` API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128187 Approved by: https://github.com/lezcano ghstack dependencies: #128183, #128184, #128185	2024-06-10 00:16:59 +00:00
Edward Z. Yang	3964a3ec73	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Reland notes. This requires this internal fbcode diff https://www.internalfb.com/phabricator/paste/view/P1403322587 but I cannot prepare the diff codev due to https://fb.workplace.com/groups/osssupport/posts/26343544518600814/ It also requires this Executorch PR https://github.com/pytorch/executorch/pull/3911 but the ET PR can be landed prior to this landing. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-09 06:20:25 +00:00
PyTorch MergeBot	ac51f782fe	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `2f7cfecd86`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/atalman due to Sorry need to revert - failing internally ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2155118778))	2024-06-07 16:01:46 +00:00
Edward Z. Yang	2f7cfecd86	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-06 02:29:45 +00:00
PyTorch MergeBot	d5cb5d623a	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `fb696ef3aa`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/ezyang due to internal user reported ceiling equality simplification problem, I have a plan ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2148805840))	2024-06-05 03:57:58 +00:00
Edward Z. Yang	fb696ef3aa	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-04 11:47:32 +00:00
Richard Barnes	3f5b59eef4	[codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901 ) Summary: Generated with ``` fbgs -f '.*\.(cpp\|cxx\|cc\|h\|hpp\|cu\|cuh)$' c10::optional -l \| perl -pe 's/^fbsource.fbcode.//' \| grep -v executorch \| xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g' ``` - If you approve of this diff, please use the "Accept & Ship" button :-) (117 files modified.) Test Plan: Sandcastle Reviewed By: palmje Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-05-24 00:26:15 +00:00
Richard Zou	f8857cef45	[Reland] Verify types in custom op schemas (#126861 ) Summary: co-dev reland of https://github.com/pytorch/pytorch/pull/124520, which requires the removal of some executorch tests. Before this PR, we didn't check that types in a schema were valid. This is because TorchScript treats unknown types as type variables. This PR checks types in a schema for the TORCH_LIBRARY APIs. To do this, we add an `allow_typevars` flag to parseSchema so that TorchScript can use allow_typevars=True. We also add some error messages for common mistakes (e.g. using int64_t or double in schema). Test Plan: Wait for tests Differential Revision: D57666659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126861 Approved by: https://github.com/albanD	2024-05-23 19:53:52 +00:00
David Berard	cb3b8cd0d3	Use object identity for deepcopy memo (#126126 ) Copy of #126089, with some additional fixes & tests Partial fix for #125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation. The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes: * (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying * (still kind of wrong): views won't actually alias each other after deepcopying. * (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias. BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++. Differential Revision: [D57406306](https://our.internmc.facebook.com/intern/diff/D57406306) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126126 Approved by: https://github.com/ezyang	2024-05-17 00:06:26 +00:00
Mikayla Gawarecki	bbdbfe3661	Reland add `write_record_metadata` to PyTorchFileWriter (#126087 ) Reland of https://github.com/pytorch/pytorch/pull/125184 with compiler warning fixed by extending `m_pWrite` rather than adding `m_pSeek` to miniz API Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D57287327](https://our.internmc.facebook.com/intern/diff/D57287327) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126087 Approved by: https://github.com/albanD	2024-05-14 21:48:44 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
PyTorch MergeBot	ccbac091d2	Revert "Add `write_record_metadata` to PyTorchFileWriter (#125184 )" This reverts commit `dd92637f44`. Reverted https://github.com/pytorch/pytorch/pull/125184 on behalf of https://github.com/izaitsevfb due to breaks internal builds, see D56962076 ([comment](https://github.com/pytorch/pytorch/pull/125184#issuecomment-2094976897))	2024-05-05 22:40:00 +00:00
Sergii Dymchenko	59abd1dccb	Fix lint after PR 122611 (#125512 ) Fix lint after https://github.com/pytorch/pytorch/pull/122611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125512 Approved by: https://github.com/clee2000	2024-05-03 22:58:20 +00:00
Iosif Spulber	4abcf36dde	Make c10::Error empty backtrace as an optional argument (#122611 ) Summary: Split from the main diff in the stack. Test Plan: Build validation should be enough. Reviewed By: ezyang Differential Revision: D55313410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122611 Approved by: https://github.com/ezyang	2024-05-03 22:50:00 +00:00
Mikayla Gawarecki	dd92637f44	Add `write_record_metadata` to PyTorchFileWriter (#125184 ) Add `PyTorchFileWriter.write_record_metadata(record_name, num_bytes)` that - writes the zipfile header/end of central directory metadata for an entry* - reserves `num_bytes` in the zipfile for the payload. *Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header Pull Request resolved: https://github.com/pytorch/pytorch/pull/125184 Approved by: https://github.com/albanD	2024-05-03 07:29:52 +00:00
PyTorch MergeBot	a46c27d961	Revert "Verify types in custom op schemas (#124520 )" This reverts commit `141888765b`. Reverted https://github.com/pytorch/pytorch/pull/124520 on behalf of https://github.com/jeanschmidt due to Breaking internal tests check D56588015 for more details ([comment](https://github.com/pytorch/pytorch/pull/124520#issuecomment-2078917978))	2024-04-26 08:42:11 +00:00
David Berard	b3cf36cb7c	Implement deepcopy / clone for SymNode, NestedIntSymNode (#121361 ) Motivation: There's a Meta-internal use case that deepcopies a bunch of metadata, which includes shapes. When we try to use NestedTensor with this tool, it errors out when we try to deepcopy the metadata, because SymNodes cannot be deepcopied. The change here is to add an implementation of `__deepcopy__`. Implementation: 1. `__deepcopy__` on SymNode calls clone() 2. Implement `clone()` in NestedIntSymNode, which previously didn't have this implemented Potential Issues: Right now, this works. But, regarding (2): Eventually we'll have some mapping between the NestedSymIntNode and its corresponding offsets/lengths tensor (cc @soulitzer who is working on this). How should this work with `__deepcopy__`? Should the offsets/lengths tensor also be cloned, or should the new symint reference the same offsets as the old symint? On one hand, we already have this issue with NestedIntSymNodeImpl::mul(): mul() creates a new NestedIntSymNodeImpl. On the other hand, `__deepcopy__` might imply different semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121361 Approved by: https://github.com/soulitzer	2024-04-26 04:18:29 +00:00
rzou	141888765b	Verify types in custom op schemas (#124520 ) Before this PR, we didn't check that types in a schema were valid. This is because TorchScript treats unknown types as type variables. This PR checks types in a schema for the TORCH_LIBRARY APIs. To do this, we add an `allow_typevars` flag to parseSchema so that TorchScript can use allow_typevars=True. We also add some error messages for common mistakes (e.g. using int64_t or double in schema). Test Plan: - new tests Differential Revision: [D56432690](https://our.internmc.facebook.com/intern/diff/D56432690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124520 Approved by: https://github.com/albanD	2024-04-25 01:56:58 +00:00
PyTorch MergeBot	92295fbacd	Revert "Verify types in custom op schemas (#124520 )" This reverts commit `5b98d43488`. Reverted https://github.com/pytorch/pytorch/pull/124520 on behalf of https://github.com/zou3519 due to broke static runtime tests ([comment](https://github.com/pytorch/pytorch/pull/124520#issuecomment-2075111935))	2024-04-24 14:41:26 +00:00
Edward Z. Yang	b04dca1502	Add pending_fresh_unbacked_symbols, populate unbacked_bindings for Dynamo (#124290 ) The important comment: ``` # Whenever we allocate a fresh unbacked Symbol, we add it to this # pending list. Unbacked symbol allocation can occur at unpredictable # points during meta tensor propagation, but at some point, the we # have to know what the binding site for an unbacked symbol is, and # this is computed when we actually place the node in the graph. The # important thing is that we always actually handle every unaccounted # for unbacked symbol, so this list helps us keep track of them and # then make sure they are all accounted for. # # We could potentially give rise to errors earlier by lexically # scoping when we do propagation, and only allowing unbacked symbols # to be allocated at this point in time. However this is inconvenient # to do in Dynamo, because fake tensor propagation is far from when we # analyze binding sites (set_example_value), so we do it in a more # mutatey way. # # NB: fresh unbacked symbols NEVER get substitutions applied to them, # they are binding sites! ``` The compute_unbacked_bindings is the other half of the equation: the thing that actually consumes the pending_fresh_unbacked_symbols and does something with them. Important comment: ``` After having run fake tensor propagation and producing example_value result, traverse example_value looking for freshly bound unbacked symbols and record their paths for later. It is an error if we have allocated an unbacked SymInt but it cannot be found in example_value. (NB: this means if you have a multi-output function, you must call this on the tuple of tensor output, you cannot wait!) ``` For example, if I return a tensor with size `[u0, u1]`, and u1 is a fresh unbacked SymInt, then I'll have `{u1: KeyPath(".size(1)")}`, telling me I can get u1 by running `size(1)` on the result of this node. u0 is not fresh (it probably flowed in as an argument), so I don't generate a binding for it. I eventually intend to propagate this information all the way to Inductor lowering, where extra metadata about unbacked symbol binding will be canonically used for codegen, instead of trying to infer it from defs/uses. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124290 Approved by: https://github.com/lezcano	2024-04-24 09:11:34 +00:00
rzou	5b98d43488	Verify types in custom op schemas (#124520 ) Before this PR, we didn't check that types in a schema were valid. This is because TorchScript treats unknown types as type variables. This PR checks types in a schema for the TORCH_LIBRARY APIs. To do this, we add an `allow_typevars` flag to parseSchema so that TorchScript can use allow_typevars=True. We also add some error messages for common mistakes (e.g. using int64_t or double in schema). Test Plan: - new tests Differential Revision: [D56432690](https://our.internmc.facebook.com/intern/diff/D56432690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124520 Approved by: https://github.com/albanD	2024-04-23 14:18:35 +00:00
ydwu4	e62169a8fa	Support torchbind op dispatch in python (#123367 ) We override the `__call__` method and register fake, functional, proxy default dispatch mode implementation in its python_key_mode_table. The idea is: 1. when inputs contains FakeScriptObject, we dispatch it through _get_dispatch mechanism. We implement dispatch mode keys automatically in the operator's constructor. 2. when inputs are not fakified, we dispatch through the original c++ dispatcher. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123367 Approved by: https://github.com/zou3519	2024-04-19 17:17:27 +00:00
Tobias Ringwald	6ba85cfc2a	Fixed memory leak in Python dispatcher w.r.t. THPDevice. (#122439 ) Fixes the memory leak reported in #122417. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122439 Approved by: https://github.com/soulitzer	2024-03-22 06:44:12 +00:00
FFFrog	485f8ebc07	add __repr__ function to FunctionSchema for Python (#121484 ) Fixes #118566 Unlike OpOverload or OpOverloadPacket, there is a lot of complex information in the schema, so for me keeping it as is is probably a good choice, but in theory the \_\_repr__ function should show the class name as well as some other key information. If you have any choices, please show me, thank you. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121484 Approved by: https://github.com/Skylion007	2024-03-11 15:16:50 +00:00
Sheng Fu	31bfa59970	Capture primitive data type arguments for profiling python_function (#120949 ) RECORD_FUNCTION in python_function only captures argument that is a Tensor. However, it is very common for user to use non tensor arguments in custom ops, for example, sequence length in GPT attention custom op. My previous PR tries to capture all non-tensor arguments, it turned out in some cases, it is very expensive. This PR is to support primitive (or its container) arguments in RECORD_FUNCTION. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120949 Approved by: https://github.com/soulitzer	2024-03-06 05:09:22 +00:00
albanD	8cb4855d1e	Release the GIL in serialization when it is safe to do so (#120818 ) In particular this ensures we release the GIL when serializing: - PyBytes objects (this is how we get the pickle object) - Storage objects Other string-like objects keep the gil which is fine because we only use this for very small strings today (for endianess) and so releasing the GIL is not important there Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120818 Approved by: https://github.com/colesbury	2024-03-01 22:37:26 +00:00
soulitzer	27c5bbe5cb	Add is_nested_int() (#119975 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119975 Approved by: https://github.com/jbschlosser ghstack dependencies: #119661, #119974	2024-02-21 21:10:02 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
cyy	5f9b432494	[2/N] Replace std::tie with structural binding (#119879 ) This PR follows #119774, Python generated code was changed to use structural binding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119879 Approved by: https://github.com/albanD	2024-02-15 02:56:34 +00:00
suo	82248f0b1c	[export] improve FakeTensor serialization (#119531 ) Recently we made it possible to serialize ExportedPrograms with fake parameters/buffers/etc. The serialization regime was kind of whacky; basically we serialized a stub and reassembled the FakeTensor using metadata that we had stashed elsewhere in the Graph state. This was bad for a few reasons: - Storing the metadata separately from the actual serialized object caused situations where you could have one but not the other. An example case is if you had a FakeTensor contained inside a TorchBind object—there was no obviously place to store the metadata for this. This actually happens—TensorQueue in fbgemm does this. - It created an annoying cycle: we had to deserialize the Graph's tensor metadata in order to deserialize (potentially faked) constants, but we need constants in order to deserialize the Graph. This fixes all that. The basic idea is to patch the reducer function for FakeTensor at serialization time, and serialize a copy of the FakeTensor metadata. We already are policing BC for the TensorMeta schema struct so it's not a net increase in the BC surface. As a bonus, I fixed a weird bug with torchbind tracing where we were accidentally reinterpreting a torch.ScriptObject as a torch.ScriptModule (which was the root cause of some weird behavior @bahuang was seeing last week). Differential Revision: [D53601251](https://our.internmc.facebook.com/intern/diff/D53601251/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119531 Approved by: https://github.com/zhxchen17	2024-02-12 19:28:08 +00:00
Simon Fan	8e14e1d514	Fix gradient refcounts in pybind and compiled autograd (#118817 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118817 Approved by: https://github.com/jansel	2024-02-07 10:25:42 +00:00
Edward Z. Yang	3f0fd36835	Introduce size oblivious guards (#118579 ) Fixes https://github.com/pytorch/pytorch/issues/117361 The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one. This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds. The infra pieces of this PR are: * Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv * When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`. * Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way. The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises. As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.) When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579 Approved by: https://github.com/eellison, https://github.com/lezcano	2024-02-06 19:45:32 +00:00
Michael Suo	eaa45f47f8	[sigmoid] fix for torchbind serialization (#118791 ) Summary: There is an annoying inconsistency in how we pickle custom objs. `torch.save` will invoke regular pickle, for which we have bound `__setstate__`/`__getstate__` methods on `torch.ScriptObject`: https://fburl.com/code/4howyl4u. This serializes in a different format than TorchScript does, which uses the TS C++ pickler. The issue we were facing was using the Python pickler to save, and the C++ pickler to load. If we use the C++ pickler to both save and load (plus some plumbing to get type/object resolution to work correctly), then things should work. Test Plan: ran SherlockNoMad's repro ``` buck2 run 'fbcode//mode/dev-nosan' scripts/bahuang:export_torchbind -- --logging DBG ``` Got to a new error, which has to do with how we're initializing the graph, but will leave that for future diffs. Reviewed By: SherlockNoMad Differential Revision: D53248454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118791 Approved by: https://github.com/qxy11, https://github.com/SherlockNoMad, https://github.com/khabinov	2024-02-01 10:09:07 +00:00
cyy	2b5a201aa6	[Exception] [3/N] Replace torch::NotImplementedError and torch::LinAlgError with C10 counterparts. (#116824 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116824 Approved by: https://github.com/albanD	2024-01-11 11:27:04 +00:00
youkaichao	16373bbc1f	fix error message in pytorch (#115349 ) Fixes https://dev-discuss.pytorch.org/t/typo-in-error-message/1709 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/115349 Approved by: https://github.com/Skylion007	2023-12-07 19:27:29 +00:00
Antonio Kim	7fc292930c	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-21 23:07:21 +00:00
Edward Z. Yang	fdaddec2c3	make_fx can now SymIntify int inputs (#113452 ) This PR also contains a basket of fixes that were turned up by now testing more arguments with SymInt. I fixed as many of the easy ones as I could easily get earlier in this stack and a bunch here, but there are some more annoying ones I xfailed. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113452 Approved by: https://github.com/Chillee ghstack dependencies: #113877, #113911	2023-11-18 06:39:09 +00:00
PyTorch MergeBot	252e68a83b	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit `54493fe8c4`. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))	2023-11-15 00:51:23 +00:00
Antonio Kim	54493fe8c4	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-13 23:18:14 +00:00
PyTorch MergeBot	9a28a7b498	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit `27e31ab6e8`. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))	2023-11-07 15:53:32 +00:00
Antonio Kim	27e31ab6e8	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-06 21:27:02 +00:00
Richard Zou	4f5acf8329	Log non-pt2_compliant ops encountered by Dynamo (#112581 ) Summary: See internal diff for more changes. Whenever we encounter a non-compliant op, we add it to a set on the OutputGraph. When a compilation event happens, we log the contents of this set. I'm planning on flipping the `only_allow_pt2_compliant_ops` config from False to True after the logging determines that existing models do not use non-compliant ops. Test Plan: - Tested the logging internally locally Differential Revision: D50884828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112581 Approved by: https://github.com/yanboliang	2023-11-01 22:53:16 +00:00
rzou	ae72607e5f	Add way to determine which overload an OpOverloadPacket will resolve to (#112199 ) The types are a bit weird (we accept and return a string) because there is not really a notion of OpOverloadPacket vs OpOverload in C++. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/112199 Approved by: https://github.com/ezyang ghstack dependencies: #112198	2023-10-29 15:36:14 +00:00
rzou	235a04c0de	Add getAllSortedOperatorsFor helper function (#112198 ) I need this for later. This roughly returns all the OpOverloads for an OpOverloadPacket in the order that the OpOverloadPacket decides to resolve them in. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/112198 Approved by: https://github.com/ezyang	2023-10-29 15:36:14 +00:00
Dino Viehland	5b71834785	Avoid c++ exception and stack trace (#111438 ) Summary: When raising an exception here this causes pybind11's dispatcher to kick in, which causes aiplatform's logic to kick in (aiplatform::error_reporting::util::printAddressesWithBestEffortLocationInfo), which ultimately uses `folly::symbolizer::Symbolizer::symbolize` for building up the stack trace. In 3.8 this uses about 3.62% of the CPU time per pyperf (https://fburl.com/scuba/pyperf_experimental/on_demand/oi554uvy). In Cinder 3.8 for some reason this is worse - using 5.94% of the CPU. This exception is happening when doing a hasattr() on `prims` for things like `bitwise_left_shift` which don't exist: https://www.internalfb.com/code/fbsource/[2d695f650d00]/fbcode/caffe2/torch/_inductor/lowering.py?lines=590 That exception is ultimately going to be swallowed anyway, and the stack trace has no meaningful value. Furthermore because this is kind of an expected outcome in the code versus some random C++ exception the stack trace is less valuable as well. This changes this to return a (None, None) on the failure case instead of returning a valid op/overload list, avoiding the exception, and reclaiming the 3.62%-5.94% of time. Test Plan: Existing CI and perf run: https://fburl.com/scuba/pyperf_experimental/on_demand/oi554uvy Differential Revision: D50018789 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111438 Approved by: https://github.com/davidberard98	2023-10-26 23:55:34 +00:00
dshi7	fbff99ffea	Add regex matching to Inductor all2all collective unit tests (#112077 ) Fixes #111776 Support check_regex in FileCheck() by adding `find_regex` in `struct TORCH_API StringCordView`. Callsite accepts RE syntax for std::regex. However, I haven't figured out submatch ID yet. For example, "buf5[0], buf6_inputs[0]" is still considered a match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112077 Approved by: https://github.com/yf225	2023-10-26 08:29:30 +00:00
jjsjann123	39c09d4da6	Revert "Revert "Nvfuser code removal (#111093 )"" (#111604 ) This reverts commit `715dfced72`. The original PR #111093 is reverted due to broken internal build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111604 Approved by: https://github.com/davidberard98	2023-10-23 18:32:41 +00:00
Tobias Ringwald	cc28b9c10a	Fixed a memory leak in PyTorchFileReader (#111703 ) Fixes #111330. This PR prevents `PyTorchFileReader` from leaking memory when initialized with an already opened file handle instead of a file name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111703 Approved by: https://github.com/Skylion007	2023-10-21 10:11:43 +00:00
PyTorch MergeBot	715dfced72	Revert "Nvfuser code removal (#111093 )" This reverts commit `572628e520`. Reverted https://github.com/pytorch/pytorch/pull/111093 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @albanD please help to support the author with the next steps to get this diff merged ([comment](https://github.com/pytorch/pytorch/pull/111093#issuecomment-1771434853))	2023-10-19 17:39:49 +00:00
jjsjann123	572628e520	Nvfuser code removal (#111093 ) Removes the existing integration code & build of nvfuser in TorchScript. Note that I intentionally left the part where we wipe out `third_party/nvfuser` repo. I'll do that in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111093 Approved by: https://github.com/albanD	2023-10-18 01:00:47 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit `eb8feb8ff8`. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
jjsjann123	e6b5e0ecc6	removing the functionality of nvfuser python APIs (#110124 ) Removing the functionalities from nvfuser python APIs. Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support. I'll have the next PR to actually remove the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124 Approved by: https://github.com/davidberard98	2023-09-29 01:45:00 +00:00
Edward Z. Yang	09622d8d49	Allow inferring size-nature from sizes passed to empty constructor (#109720 ) This removes the need for many constrain_as_size calls as we now infer them from error checking for sizes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720 Approved by: https://github.com/aakhundov	2023-09-21 17:57:40 +00:00
soulitzer	8bc00dfffd	Hashing for constant and singleton SymInt/SymBool (#109170 ) Bugfix: - previously, SymBool does not implement `__eq__`, Python falls back to default `__eq__ `and `__hash__` - in this PR, we make SymBool implement `__eq__` - symbolic SymBool now raises an error when hashed just like SymInt/SymFloat New feature: - previously, SymInt and SymFloat are unhashable (even if you are singleton or constant) - in this PR, SymInt and SymBool are hashable if singleton/constant Stay the same: - SymNode are hashable due to default Python behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/109170 Approved by: https://github.com/ezyang ghstack dependencies: #109169	2023-09-20 20:37:15 +00:00
soulitzer	5252fcb133	Handle constant SymBool in unary and binary operations (#109169 ) In this PR: - When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool. - Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169 Approved by: https://github.com/ezyang	2023-09-20 20:37:15 +00:00
cyy	efc7c366f4	Remove auto_gil.h (#108492 ) auto_gil.h has been deprecated for a long time. We can switch to pybind11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108492 Approved by: https://github.com/Skylion007	2023-09-05 08:26:13 +00:00
Ilya Sherstyuk	2b3917dc63	[ONNX] Fix memory leak when exporting models (#107244 ) This commit fixes a memory leak caused by creating a new PyListObject using PyDict_Items() and not releasing that list later. This often prevented the entire model from being de-allocated even when all python references to it have gone out of scope. Here is a repro script: ```python import psutil, torch, transformers, gc, os, sys import math # Size in MB model_size = 512 kB = 1024 MB = kB * kB precision_size = 4 # bytes per float activation_size = math.floor(math.sqrt(model_size * MB / precision_size)) class Net(torch.nn.Module): def __init__(self, activation_size): super(Net, self).__init__() self.linear = torch.nn.Linear(activation_size, activation_size) def forward(self, x): return {"result": self.linear(x)} def collect_and_report(s): gc.collect() print(s) #print("psutil: ", psutil.virtual_memory().percent) print("CPU MB used by this process: ", psutil.Process(os.getpid()).memory_info().rss / 1024 2) print("GPU MB allocated by pytorch: ", torch.cuda.memory_allocated(0) / 1024 2) print() def run_test(device_str): device = torch.device(device_str) dummy_input = torch.zeros(activation_size, requires_grad=True).to(device) collect_and_report("Before loading model: ") model = Net(activation_size).to(device) collect_and_report("After loading model: ") torch.onnx.export(model, dummy_input, "dummy.onnx") collect_and_report("After exporting model: ") del model collect_and_report("After deleting model:") print("Running CPU test: ") run_test("cpu") print("Running GPU test: ") run_test("cuda") ``` Results with this commit: ``` Running CPU test: Before loading model: CPU MB used by this process: 346.5 GPU MB allocated by pytorch: 0.0 After loading model: CPU MB used by this process: 861.078125 GPU MB allocated by pytorch: 0.0 After exporting model: CPU MB used by this process: 880.12890625 GPU MB allocated by pytorch: 0.0 After deleting model: CPU MB used by this process: 880.12890625 GPU MB allocated by pytorch: 0.0 Running GPU test: Before loading model: CPU MB used by this process: 991.9375 GPU MB allocated by pytorch: 0.04443359375 After loading model: CPU MB used by this process: 992.19140625 GPU MB allocated by pytorch: 512.0888671875 After exporting model: CPU MB used by this process: 1026.64453125 GPU MB allocated by pytorch: 520.25830078125 After deleting model: CPU MB used by this process: 1026.64453125 GPU MB allocated by pytorch: 520.25830078125 ``` With this commit: ``` Running CPU test: Before loading model: CPU MB used by this process: 372.7734375 GPU MB allocated by pytorch: 0.0 After loading model: CPU MB used by this process: 887.18359375 GPU MB allocated by pytorch: 0.0 After exporting model: CPU MB used by this process: 918.96875 GPU MB allocated by pytorch: 0.0 After deleting model: CPU MB used by this process: 407.3671875 GPU MB allocated by pytorch: 0.0 Running GPU test: Before loading model: CPU MB used by this process: 516.6875 GPU MB allocated by pytorch: 0.04443359375 After loading model: CPU MB used by this process: 516.75390625 GPU MB allocated by pytorch: 512.0888671875 After exporting model: CPU MB used by this process: 554.25390625 GPU MB allocated by pytorch: 520.2138671875 After deleting model: CPU MB used by this process: 554.25390625 GPU MB allocated by pytorch: 8.16943359375 ``` Fixes #106976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107244 Approved by: https://github.com/BowenBao, https://github.com/kit1980	2023-08-17 22:15:28 +00:00
David Berard	25d87c8301	torch.ops.aten.: sort aten ops before jit overloads (#107138 ) Summary: In fbcode, aten and jit ops can get registered in different orders depending on build mode. In dev mode, aten is registered first; in opt mode, jit is registered first. This causes problems in torch.ops.aten. calls; these calls use `torch._C._jit_get_operation`, which selects an overload based on the inputs to the call. It searches through the overloads for the op with the given name, and chooses the first one that matches the input types. "First" depends on whether aten or jit ops were registered first - e.g. in `test_both_scalars_cuda` in opt mode, it chooses `add.complex` and returns a complex value. We also saw this issue in https://github.com/pytorch/pytorch/pull/103576. This PR sorts the list of overloads first, putting the aten ops first. Differential Revision: D48304930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107138 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-08-17 03:05:59 +00:00
Sam Gross	d0e50d9094	Move overloaded_args from FunctionSignature to PythonArgs (#106983 ) This moves the `overloaded_args` field from FunctionSignature to PythonArgs. FunctionSignature is shared by all calls and should be immutable. PythonArgs contains the parsing results for an single call to the PyTorch API. I did not measure a difference in performance in the "overrides_benchmark", although I expect there to be a bit more work in the common case. Note that the noise factor for the benchmark is much larger than the differences reported below: Before: ``` Type tensor had a minimum time of 2.3615360260009766 us and a standard deviation of 0.7833134150132537 us. Type SubTensor had a minimum time of 10.473251342773438 us and a standard deviation of 0.1973132457351312 us. Type WithTorchFunction had a minimum time of 5.484819412231445 us and a standard deviation of 0.13305981701705605 us. Type SubWithTorchFunction had a minimum time of 11.098146438598633 us and a standard deviation of 0.15598918253090233 us. ``` After: ``` Type tensor had a minimum time of 2.2134780883789062 us and a standard deviation of 0.802064489107579 us. Type SubTensor had a minimum time of 10.625839233398438 us and a standard deviation of 0.15155907021835446 us. Type WithTorchFunction had a minimum time of 5.520820617675781 us and a standard deviation of 0.23115111980587244 us. Type SubWithTorchFunction had a minimum time of 11.227846145629883 us and a standard deviation of 0.23032321769278497 us. ``` Fixes #106974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106983 Approved by: https://github.com/zou3519, https://github.com/ezyang, https://github.com/albanD	2023-08-16 15:59:26 +00:00
Edward Z. Yang	5673c0874c	Use expect_true to make split with unbacked sizes work. (#106788 ) This pattern shows up in torchrec KeyedJaggedTensor. Most of the change in this PR is mechanical: whenever we failed an unbacked symint test due to just error checking, replace the conditional with something that calls expect_true (e.g., torch._check or TORCH_SYM_CHECK). Some of the changes are a bit more nuanced, I've commented on the PR accordingly. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106788 Approved by: https://github.com/lezcano ghstack dependencies: #106720	2023-08-15 20:31:30 +00:00
Edward Z. Yang	e1ee10e6f5	Add expect_true for irrefutable guards (#106720 ) Here's what it does from the comments: ``` Assume that a boolean is true for the purposes of subsequent symbolic reasoning. This will keep track of corresponding runtime checks to verify that the result is upheld: either as a regular guard, or as a special set of asserts which are triggered when an unbacked SymInt is allocated. DO NOT use this function for these cases: - This is inappropriate for "branching" conditions (where both true and false result in valid programs). We will always assume the condition evaluates true, and so it will never be possible to trace the false condition when you use it. For true branching on unbacked SymInts, you must use torch.cond. - This is inappropriate for situations where you know some other system invariant guarantees that this property holds, since you don't really need to insert a runtime check in that case. Use something like constrain_range in that case. This API has a hitch. To avoid having to reimplement error reporting capabilities, this function CAN return False. The invariant is that the surrounding code must raise an error when this function returns False. This is quite low level, so we recommend using other functions like check() which enforce this in a more intuitive way. By the way, this name is a nod to the __builtin_expect likely macro, which is used similarly (but unlike __builtin_expect, you MUST fail in the unlikely branch.) ``` We don't do anything with this right now, except use it to discharge regular guards. Follow up PRs to (1) use it at important error checking sites, (2) actually ensure the runtime asserts make there way into the exported IR / inductor generated code. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106720 Approved by: https://github.com/ysiraichi, https://github.com/voznesenskym	2023-08-15 18:42:22 +00:00
albanD	3a07dfde48	Fix lifetime of JITException binding (#106401 ) Fix issues with new asserts introduced in 3.12 and pybind gil holding check on destructor. See https://github.com/pybind/pybind11/pull/4769 for details on why this is a preferred solution rather than skipping the decref in all pybind object destructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106401 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/Skylion007	2023-08-07 18:00:50 +00:00
Mengwei Liu	4fafe0b74c	[export][serde] Hookup export upgrader with TorchScript upgrader entries (#104227 ) Adding an API to get the upgraders entry map directly from: https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/operator_upgraders/upgraders_entry.cpp#L17 Combine the information there along with the operator version map from: https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/operator_upgraders/version_map.cpp#L18 We can get a upgrader map with: upgrader name, old schema and upgrader string. This dict will be sent to GraphModuleOpUpgrader to populate the upgrader passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104227 Approved by: https://github.com/angelayi, https://github.com/zhxchen17	2023-07-06 16:57:36 +00:00
Mikayla Gawarecki	6fa2d41dc7	Add mmap option to `torch.load` (#102549 ) Using [`nanoGPT/model.py`](https://github.com/karpathy/nanoGPT/blob/master/model.py) run <details><summary><b>Click for script to save gpt2-xlarge (1.5B params)</b></summary> ``` # test_load_save_gpt.py from model import GPT import torch import time torch.manual_seed(5) # gpt2-xlarge 1558M parameters class GPTConfig: block_size: int = 1024 vocab_size: int = 50304 # GPT-2 vocab_size of 50257, padded up to nearest multiple of 64 for efficiency n_layer: int = 48 n_head: int = 25 n_embd: int = 1600 dropout: float = 0.0 bias: bool = True # True: bias in Linears and LayerNorms, like GPT-2. False: a bit better and faster def f(): model = GPT(GPTConfig()) state_dict = model.state_dict() start_saving = time.time() torch.save(state_dict, "gpt2-xlarge.pth") end_saving = time.time() if __name__ == "__main__": f() ``` </details> <details><summary><b>Click for script to load</b></summary> ``` # test_load_gpt.py import torch from model import GPT from test_load_save_gpt import GPTConfig import time import argparse def f(mmap, meta): device = 'meta' if meta else 'cpu' assign = True if meta else False with torch.device(device): model = GPT(GPTConfig()) start_loading = time.time() loaded_state_dict = torch.load("gpt2-xlarge.pth", _mmap=mmap) end_loading = time.time() print(f"loading time using torch.load with mmap={mmap}: ", end_loading - start_loading) model.load_state_dict(loaded_state_dict, assign=assign) end_load_state_dict = time.time() print("load_state_dict time: ", end_load_state_dict - end_loading) model.cuda() end_cuda = time.time() print("cuda time using torch.load with mmap: ", end_cuda - end_load_state_dict) if __name__ == "__main__": parser = argparse.ArgumentParser(prog='load_gpt_xlarge') parser.add_argument('-m', '--mmap', action='store_true') parser.add_argument('-d', '--devicemeta', action='store_true') args = parser.parse_args() mmap = args.mmap meta = args.devicemeta f(mmap, meta) ``` </details> `python test_load_gpt.py` <img width="614" alt="Screenshot 2023-06-06 at 1 35 43 PM" src="https://github.com/pytorch/pytorch/assets/35276741/ee06e5b3-b610-463b-a867-df995d21af29"> `python test_load_gpt.py --mmap` <img width="622" alt="Screenshot 2023-06-06 at 1 35 30 PM" src="https://github.com/pytorch/pytorch/assets/35276741/00d2fdd0-b1f5-4313-83dc-e540b654b2af"> If we further use the `with torch.device('meta')` context manager and pull the changes from https://github.com/pytorch/pytorch/pull/102212 that allow the model to reuse tensors from the state_dict, we have `python test_load_gpt.py --mmap --devicemeta` <img width="727" alt="Screenshot 2023-06-06 at 1 35 51 PM" src="https://github.com/pytorch/pytorch/assets/35276741/b50257d9-092a-49c3-acae-876ee44d009f"> \ \ Running the above in a docker container containing a build of PyTorch with RAM limited to 512mb by 1) running `make -f docker.Makefile` from `pytorch/` directory 2) `docker run -m 512m -it <image> bash` 3) docker cp `gpt2-xlarge.pth` and `test_load_gpt.py` into the image `python test_load_gpt.py` Docker will Kill the process due to OOM whereas `python test_load_gpt.py --mmap --devicemeta` <img width="635" alt="Screenshot 2023-06-06 at 1 55 48 PM" src="https://github.com/pytorch/pytorch/assets/35276741/f3820d9e-f24c-43e7-885b-3bfdf24ef8ad"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102549 Approved by: https://github.com/albanD	2023-06-09 15:49:58 +00:00
PandaNinjas	20ca994a3e	Use size in python list (#102538 ) Resubmission of #101922 Description copied verbatim Potentially fixes the second issue described in https://github.com/pytorch/pytorch/issues/87159. In python_list.h, int64_t is used when diff_type is better suited. On 32 bit systems, int64_t isn't a proper signed size type, which may cause the compilation error described in https://github.com/pytorch/pytorch/issues/87159. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102538 Approved by: https://github.com/albanD	2023-06-01 00:46:29 +00:00
PyTorch MergeBot	0803b91867	Revert "Replace int64_t with a size type in python_list.h when applicable (#101922 )" This reverts commit `44e7f07ed4`. Reverted https://github.com/pytorch/pytorch/pull/101922 on behalf of https://github.com/atalman due to breaks windows nightlies ([comment](https://github.com/pytorch/pytorch/pull/101922#issuecomment-1567240450))	2023-05-29 14:58:31 +00:00
PandaNinjas	44e7f07ed4	Replace int64_t with a size type in python_list.h when applicable (#101922 ) Potentially fixes the second issue described in #87159. In python_list.h, `int64_t` is used when `diff_type` is better suited. On 32 bit systems, int64_t isn't a proper signed size type, which may cause the compilation error described in #87159. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101922 Approved by: https://github.com/Skylion007	2023-05-27 17:55:53 +00:00
atannous	3ed1569e86	Adding serialization ID to inline container (#100994 ) Summary: In order to better track models after serialization, this change writes a serialization_id as a UUID to inline container. Having this ID enables traceability of model in saving and loading events. serialization_id is generated as a new UUID everytime serialization takes place. It can be thought of as a model snapshot identifier at the time of serialization. Test Plan: ``` buck2 test @//mode/dev //caffe2/caffe2/serialize:inline_container_test ``` Local tests: ``` buck2 run @//mode/opt //scripts/atannous:example_pytorch_package buck2 run @//mode/opt //scripts/atannous:example_pytorch buck2 run @//mode/opt //scripts/atannous:example_pytorch_script ``` ``` $ unzip -l output.pt Archive: output.pt Length Date Time Name --------- ---------- ----- ---- 36 00-00-1980 00:00 output/.data/serialization_id 358 00-00-1980 00:00 output/extra/producer_info.json 58 00-00-1980 00:00 output/data.pkl 261 00-00-1980 00:00 output/code/__torch__.py 326 00-00-1980 00:00 output/code/__torch__.py.debug_pkl 4 00-00-1980 00:00 output/constants.pkl 2 00-00-1980 00:00 output/version --------- ------- 1045 7 files ``` ``` unzip -p output.pt "output/.data/serialization_id" a9f903df-cbf6-40e3-8068-68086167ec60 ``` Differential Revision: D45683657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100994 Approved by: https://github.com/davidberard98	2023-05-17 17:08:48 +00:00
Aleksei Nikiforov	effe1425dd	ASAN: fix use-after-free (#101400 ) arguments() returns vector member of object returned by schema() call. When object returned by schema() call is destroyed, the vector is deallocated as well, it's lifetime isn't extended. This issue detected while running `pytest -v test/mobile/test_lite_script_type.py -k test_nest_typing_namedtuple_custom_classtype` with ASAN. <details> <summary>ASAN output</summary> ``` ==1134126==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0005a5790 at pc 0x03ff844488d8 bp 0x03fff584afe8 sp 0x03fff584afd8 READ of size 8 at 0x60d0005a5790 thread T0 #0 0x3ff844488d7 in __gnu_cxx::__normal_iterator<c10::Argument const, std::vector<c10::Argument, std::allocator<c10::Argument> > >::__normal_iterator(c10::Argument const const&) /usr/lib/gcc/s390x-i bm-linux-gnu/11/include/g++-v11/bits/stl_iterator.h:1028 #1 0x3ff8444293f in std::vector<c10::Argument, std::allocator<c10::Argument> >::begin() const /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_vector.h:821 #2 0x3ff84d807d1 in torch::jit::toPyObject(c10::IValue) /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:617 #3 0x3ff84d80305 in torch::jit::toPyObject(c10::IValue) /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604 #4 0x3ff84856871 in pybind11::detail::type_caster<c10::IValue, void>::cast(c10::IValue, pybind11::return_value_policy, pybind11::handle) /home/user/pytorch/torch/csrc/jit/python/pybind.h:138 #5 0x3ff85318191 in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is _method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object)::$_45&&, c10::IValue ()(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_me thod const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::operator()(pybind11::detail::function_call&) const /home/user/pytorch/cmake/../third_party/pybin d11/include/pybind11/pybind11.h:249 #6 0x3ff85317cfd in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is _method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object)::$_45&&, c10::IValue ()(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_me thod const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) /home/user/pytorch/cmake/../third_party/pybind11/incl ude/pybind11/pybind11.h:224 #7 0x3ff82ee52e9 in pybind11::cpp_function::dispatcher(_object, _object, _object) /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:929 #8 0x3ffab002903 in cfunction_call Objects/methodobject.c:543 #9 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215 #10 0x3ffaaf8e919 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112 #11 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #12 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #13 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #14 0x3ffab105447 in call_function Python/ceval.c:5891 #15 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #16 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #17 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #18 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #19 0x3ffaaf8a615 in _PyObject_FastCallDictTstate Objects/call.c:142 #20 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431 #21 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494 #22 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215 #23 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112 #24 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #25 0x3ffab105447 in call_function Python/ceval.c:5891 #26 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #27 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #28 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #29 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #30 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #31 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #32 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #33 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #34 0x3ffab105447 in call_function Python/ceval.c:5891 #35 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #36 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #37 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #38 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #39 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #40 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #41 0x3ffab105447 in call_function Python/ceval.c:5891 #42 0x3ffab0ff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198 #43 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #44 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #45 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #46 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #47 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #48 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #49 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #50 0x3ffab105447 in call_function Python/ceval.c:5891 #51 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #52 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #53 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #54 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #55 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #56 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #57 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #58 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #59 0x3ffab105447 in call_function Python/ceval.c:5891 #60 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #61 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #62 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #63 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #64 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #65 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #66 0x3ffaaf8ab9b in PyVectorcall_Call Objects/call.c:267 #67 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 #68 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #69 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #70 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #71 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #72 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #73 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #74 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153 #75 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431 #76 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494 #77 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215 #78 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112 #79 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #80 0x3ffab105447 in call_function Python/ceval.c:5891 #81 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #82 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #83 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #84 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #85 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #86 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #87 0x3ffab105447 in call_function Python/ceval.c:5891 #88 0x3ffab0ff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198 #89 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #90 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #91 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #92 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255 #93 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 #94 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #95 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #96 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #97 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #98 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #99 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #100 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #101 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #102 0x3ffab105447 in call_function Python/ceval.c:5891 #103 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #104 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #105 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #106 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #107 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #108 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #109 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #110 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #111 0x3ffab105447 in call_function Python/ceval.c:5891 #112 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #113 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #114 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #115 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #116 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153 #117 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431 #118 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494 #119 0x3ffaaf8ad17 in _PyObject_Call Objects/call.c:305 #120 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #121 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #122 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #123 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #124 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #125 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #126 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #127 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #128 0x3ffab105447 in call_function Python/ceval.c:5891 #129 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #130 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #131 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #132 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #133 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #134 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #135 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #136 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #137 0x3ffab105447 in call_function Python/ceval.c:5891 #138 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #139 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #140 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #141 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #142 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255 #143 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 #144 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #145 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #146 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #147 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #148 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #149 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #150 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #151 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #152 0x3ffab105447 in call_function Python/ceval.c:5891 #153 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #154 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #155 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #156 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #157 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #158 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #159 0x3ffab105447 in call_function Python/ceval.c:5891 #160 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #161 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #162 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #163 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #164 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255 #165 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 #166 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #167 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #168 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #169 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #170 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #171 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #172 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #173 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #174 0x3ffab105447 in call_function Python/ceval.c:5891 #175 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #176 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #177 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #178 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #179 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #180 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #181 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #182 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #183 0x3ffab105447 in call_function Python/ceval.c:5891 #184 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #185 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #186 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #187 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #188 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153 #189 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431 #190 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494 #191 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215 #192 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112 #193 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #194 0x3ffab105447 in call_function Python/ceval.c:5891 #195 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #196 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #197 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #198 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #199 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255 #200 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 #201 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317 #202 0x3ffab1059c7 in do_call_core Python/ceval.c:5943 #203 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277 #204 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #205 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #206 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #207 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #208 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #209 0x3ffab105447 in call_function Python/ceval.c:5891 #210 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #211 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #212 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #213 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #214 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #215 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53 #216 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #216 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #217 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #218 0x3ffab105447 in call_function Python/ceval.c:5891 #219 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181 #220 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #221 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #222 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #223 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153 #224 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431 #225 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494 #226 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215 #227 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112 #228 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #229 0x3ffab105447 in call_function Python/ceval.c:5891 #230 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231 #231 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #232 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #233 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #234 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #235 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #236 0x3ffab105447 in call_function Python/ceval.c:5891 #237 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #238 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #239 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #240 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #241 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114 #242 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123 #243 0x3ffab105447 in call_function Python/ceval.c:5891 #244 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213 #245 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46 #246 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065 #247 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342 #248 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255 #249 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290 0x60d0005a5790 is located 80 bytes inside of 136-byte region [0x60d0005a5740,0x60d0005a57c8) freed by thread T0 here: #0 0x3ffab537de5 in operator delete(void) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160 #1 0x3ff55984fdb in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::deallocate(std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2>, unsigned long) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:145 previously allocated by thread T0 here: #0 0x3ffab53734f in operator new(unsigned long) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x3ff5598443f in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:127 #2 0x3fff5849ecf ([stack]+0xb2ecf) SUMMARY: AddressSanitizer: heap-use-after-free /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_iterator.h:1028 in __gnu_cxx::__normal_iterator<c10::Argument const, std::vector<c10::Argument, std::allocator<c10::Argument> > >::__normal_iterator(c10::Argument const const&) Shadow bytes around the buggy address: 0x100c1a000b4aa0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa 0x100c1a000b4ab0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x100c1a000b4ac0: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd 0x100c1a000b4ad0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa 0x100c1a000b4ae0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd =>0x100c1a000b4af0: fd fd[fd]fd fd fd fd fd fd fa fa fa fa fa fa fa 0x100c1a000b4b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1a000b4b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1a000b4b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1a000b4b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1a000b4b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==1134126==ABORTING ``` Additional backtraces (not full): Allocation: ``` #0 __memset_z196 () at ../sysdeps/s390/memset-z900.S:144 #1 0x000003ff96f3072a in __asan::Allocator::Allocate (this=this@entry=0x3ff97041eb8 <__asan::instance>, size=size@entry=136, alignment=8, alignment@entry=0, stack=<optimized out>, stack@entry=0x3ffdbb45d78, alloc_type=<optimized out>, can_fill=true) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_allocator.cpp:599 #2 0x000003ff96f2c088 in __asan::asan_memalign (alignment=alignment@entry=0, size=size@entry=136, stack=stack@entry=0x3ffdbb45d78, alloc_type=alloc_type@entry=__asan::FROM_NEW) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_allocator.cpp:1039 #3 0x000003ff96fb73b0 in operator new (size=136) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:99 #4 0x000003ff41404440 in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::allocate (this=0x3ffdbb468c0, __n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:127 #5 0x000003ff414042a0 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::allocate (__a=..., __n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/alloc_traits.h:464 #6 0x000003ff41403b66 in std::__allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > > (__a=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/allocated_ptr.h:98 #7 0x000003ff4140372a in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (this=0x3ffdbb47888, __p=@0x3ffdbb47880: 0x0, __a=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:648 #8 0x000003ff41403328 in std::__shared_ptr<c10::FunctionSchema, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (this=0x3ffdbb47880, __tag=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1342 #9 0x000003ff41402f06 in std::shared_ptr<c10::FunctionSchema>::shared_ptr<std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > ( this=0x3ffdbb47880, __tag=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:409 #10 0x000003ff41402b6e in std::allocate_shared<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (__a=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:862 #11 0x000003ff4140215c in std::make_shared<c10::FunctionSchema, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (__args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:878 #12 0x000003ff413d180c in c10::TupleType::createWithSpec<c10::basic_string_view<char> > (qualName=..., field_names=std::vector of length 1, capacity 1 = {...}, field_types=std::vector of length 1, capacity 1 = {...}, field_defaults=std::vector of length 0, capacity 0) at /home/user/pytorch/aten/src/ATen/core/type.cpp:769 #13 0x000003ff413b9ca6 in c10::TupleType::createNamed (qualName=..., field_names=std::vector of length 1, capacity 1 = {...}, field_types=std::vector of length 1, capacity 1 = {...}) at /home/user/pytorch/aten/src/ATen/core/type.cpp:725 #14 0x000003ff4115fbac in c10::ivalue::TupleTypeFactory<c10::TupleType>::fallback (type=...) at /home/user/pytorch/aten/src/ATen/core/dynamic_type.cpp:383 #15 0x000003ff708217fe in c10::ivalue::Tuple::type<c10::TupleType> (this=0x6080004b8520) at /home/user/pytorch/aten/src/ATen/core/ivalue_inl.h:781 #16 0x000003ff70800740 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:613 #17 0x000003ff70800306 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604 #18 0x000003ff702d6872 in pybind11::detail::type_caster<c10::IValue, void>::cast (src=...) at /home/user/pytorch/torch/csrc/jit/python/pybind.h:138 #19 0x000003ff70d98192 in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object)::$_45&&, c10::IValue ()(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::operator()(pybind11::detail::function_call&) const (this=0x3ffdbb4ca20, call=...) at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:249 #20 0x000003ff70d97cfe in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object)::$_45&&, c10::IValue ()(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) (call=...) at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:224 #21 0x000003ff6e9652ea in pybind11::cpp_function::dispatcher (self=<PyCapsule at remote 0x3ff83e27720>, args_in=(<torch._C.LiteScriptModule at remote 0x3ff811844b0>, (<Tensor at remote 0x3ff814efb00>,)), kwargs_in=0x0) at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:929 ``` Deallocation: ``` #0 operator delete (ptr=0x60d0005a5740) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160 #1 0x000003ff44904fdc in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::deallocate (this=0x3ffc5dc8020, __p=0x60d0005a5740, __t=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:145 #2 0x000003ff44904fa8 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::deallocate ( __a=..., __p=0x60d0005a5740, __n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/alloc_traits.h:496 #3 0x000003ff449041f2 in std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::~__allocated_ptr ( this=0x3ffc5dc8030) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/allocated_ptr.h:74 #4 0x000003ff44904888 in std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2>::_M_destroy (this=0x60d0005a5740) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:538 #5 0x000003ff43895a62 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x60d0005a5740) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:184 #6 0x000003ff43895420 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x611000c40648) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:705 #7 0x000003ff4466e7f4 in std::__shared_ptr<c10::FunctionSchema, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x611000c40640) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1154 #8 0x000003ff4466d820 in std::shared_ptr<c10::FunctionSchema>::~shared_ptr (this=0x611000c40640) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:122 #9 0x000003ff448d82f6 in c10::TupleType::~TupleType (this=0x611000c40580) at /home/user/pytorch/aten/src/ATen/core/jit_type.h:1142 #10 0x000003ff448d8346 in c10::TupleType::~TupleType (this=0x611000c40580) at /home/user/pytorch/aten/src/ATen/core/jit_type.h:1142 #11 0x000003ff731296a4 in std::_Sp_counted_ptr<c10::TupleType*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x603000c43ae0) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:348 #12 0x000003ff71eaf666 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x603000c43ae0) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:168 #13 0x000003ff71eaf330 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x3ffc5dc9368) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:705 #14 0x000003ff73129ee4 in std::__shared_ptr<c10::TupleType, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x3ffc5dc9360) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1154 #15 0x000003ff73122390 in std::shared_ptr<c10::TupleType>::~shared_ptr (this=0x3ffc5dc9360) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:122 #16 0x000003ff73d00788 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:613 #17 0x000003ff73d00306 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604 ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101400 Approved by: https://github.com/zou3519	2023-05-15 15:32:10 +00:00
Luthaf	000368b092	Allow C++ custom class to define __repr__ and use it from Python (#100724 ) When handling custom classes from Python, it is nice to be able to specify how they are displayed to the user. Out of the two standard functions to do this, only `__str__` could be implemented in C++. This PR add `__repr__` to the allowlist of magic methods. The second commit tweaks the default output of `__str__` to make it more informative, but I can remove the change if you want. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100724 Approved by: https://github.com/ezyang	2023-05-10 15:46:45 +00:00
Luthaf	5970fb402e	C++ CustomClass in Python: indicate which methods are not implemented (#100171 ) Without these changes, it can be hard to know which magic methods are not implemented on a given ScriptObject. before: ```py torch.ops.load_library("somelib.so") c = torch.classes.somelib.SomeClass() print(len(c)) # raise NotImplementedError ``` after: ```py torch.ops.load_library("somelib.so") c = torch.classes.somelib.SomeClass() print(len(c)) # raise NotImplementedError: '__len__' is not implemented for __torch__.torch.classes.somelib.SomeClass ``` ------ I could not find a linked issue, if you want me to open one as well I can do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100171 Approved by: https://github.com/ezyang	2023-05-09 18:41:40 +00:00
mikey dagitses	0e017af35b	make torch/csrc/jit/python/pybind_utils.cpp data_ptr-correct (#100682 ) make torch/csrc/jit/python/pybind_utils.cpp data_ptr-correct Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100682 Approved by: https://github.com/Skylion007	2023-05-05 15:53:06 +00:00
Luca Wehrstedt	24bf15fe8d	Support record_stream in dispatch mode (#99529 ) Summary: Issuing a `t.record_stream(s)` call while a `TorchDispatchMode` is active was throwing because PyTorch was unable to convert a c10::Stream back to a Python object. It's now fixed. Fixes https://github.com/pytorch/pytorch/issues/94403 Test Plan: Added a unit test Differential Revision: D45117566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99529 Approved by: https://github.com/albanD	2023-04-21 07:17:19 +00:00
Luca Wehrstedt	df84d74058	Allow getting type of ScriptObject (#99542 ) Summary: A very old refactor (https://github.com/pytorch/pytorch/pull/29500) split ScriptModule into ScriptObject (base class) and ScriptModule (subclass). When moving methods around, the `_type` method was moved from ScriptModule to ScriptObject, but the type of its argument wasn't changed. Therefore, it is now impossible to invoke `_type` on a ScriptObject. The reason I need this fix is that I am using PyTorch's dispatch mode to intercept some operators that accept/return custom classes, which end up being encoded as ScriptObject, and in order to properly handle them I need to be able to verify their type. Test Plan: N/A Differential Revision: D45118675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99542 Approved by: https://github.com/albanD	2023-04-20 16:10:19 +00:00

1 2 3 4 5 ...

889 Commits