pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
angelayi	dd30667f6c	[opaque_obj] Add set_payload + docs (#163276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163276 Approved by: https://github.com/zou3519 ghstack dependencies: #162660	2025-09-22 18:30:28 +00:00
angelayi	3be9c86c74	[opaque obj] Initial OpaqueObject (#162660 ) A big pain point ppl have with custom ops is that they do not accept arbitrary input/outputs. In this PR we create the concept of an "OpaqueObject" which allows users to pass arbitrary python objects into custom operators. Some still slightly annoying parts with this implementation: - The schema of the operator is `__torch__.torch.classes.aten.OpaqueObject` instead of whatever python type - `@torch.library.custom_op` doesn't work.. yet? UX: ```python from torch._library.opaque_object import make_opaque, get_payload # your custom python class class OpaqueQueue: def __init__(self, queue: list[torch.Tensor], init_tensor_: torch.Tensor) -> None: super().__init__() self.queue = queue self.init_tensor_ = init_tensor_ def push(self, tensor: torch.Tensor) -> None: self.queue.append(tensor) def pop(self) -> torch.Tensor: if len(self.queue) > 0: return self.queue.pop(0) return self.init_tensor_ def size(self) -> int: return len(self.queue) queue = OpaqueQueue([], torch.zeros(3)) obj: torch._C.ScriptObject = make_opaque(queue) # obj.payload stores a direct reference to this python queue object self.assertEqual(get_payload(obj), queue) # This is able to be passed through the dispatcher torch.ops._TestOpaqueObject.queue_push(obj, torch.ones(3)) self.assertTrue(queue.size(), 1) ``` Authoring a custom op: ```python lib = torch.library.Library("_TestOpaqueObject", "FRAGMENT") torch.library.define( f"_TestOpaqueObject::queue_push", "(__torch__.torch.classes.aten.OpaqueObject a, Tensor b) -> ()", tags=torch.Tag.pt2_compliant_tag, lib=lib, ) @torch.library.impl(f"{libname}::queue_push", "CompositeExplicitAutograd", lib=lib) def push_impl(q: torch._C.ScriptObject, b: torch.Tensor) -> None: # We can get the payload directly by get_payload(q) queue = get_payload(q) assert isinstance(queue, OpaqueQueue) queue.push(b) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162660 Approved by: https://github.com/zou3519	2025-09-22 18:30:28 +00:00
PyTorch MergeBot	f0078941cf	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit `6c334885d4`. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))	2025-09-22 05:39:07 +00:00
Edward Yang	6c334885d4	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 10:54:42 +00:00
PyTorch MergeBot	6b59a19242	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit `6e8f17c580`. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))	2025-09-12 06:52:03 +00:00
Edward Yang	6e8f17c580	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 03:56:18 +00:00
Edward Yang	dda071587f	Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 ) This reverts commit `a0d026688c`. Revert "Always build USE_DISTRIBUTED. (#160449)" This reverts commit `d80297a684`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568 Approved by: https://github.com/huydhn	2025-09-10 04:29:42 +00:00
Scott Wolchok	0e7ccc09db	[easy] Don't force copy result of getAllOperatorsFor in init.cpp (#162218 ) It returns a const reference to a vector. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162218 Approved by: https://github.com/Skylion007 ghstack dependencies: #161591, #161595, #161633, #161634, #161692, #162219, #162220	2025-09-10 00:08:15 +00:00
Scott Wolchok	dcc42e95f4	Fix missing moves in initJITBindings (#162428 ) Per @Skylion007 on #162219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162428 Approved by: https://github.com/Skylion007	2025-09-09 08:47:33 +00:00
Scott Wolchok	a8a187b2cf	Overload _get_operation_for_overload_or_packet & friends to accept ArrayRef (#162219 ) Avoids requiring vector allocation to call this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162219 Approved by: https://github.com/Skylion007 ghstack dependencies: #161591, #161595, #161633, #161634, #161692	2025-09-09 01:10:06 +00:00
Scott Wolchok	a951f435fd	Avoid redundant PyTuple_GetSize call in _maybe_handle_torch_function (#161633 ) py::args::size() calls PyTuple_GetSize. Compiler can't know the two calls will always return the same result, so we have to consolidate them ourselves. Differential Revision: [D81530096](https://our.internmc.facebook.com/intern/diff/D81530096) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161633 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #161591, #161595	2025-09-09 01:10:06 +00:00
Edward Yang	d80297a684	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-08 19:10:36 +00:00
Scott Wolchok	49c446c617	Add C++ function for torch.distributed.tensor._op_schema.is_view_op (#161595 ) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: #161466, #161586, #161590, #161591	2025-09-08 16:28:08 +00:00
PyTorch MergeBot	1e0656f063	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `de893e96c7`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))	2025-09-08 07:04:36 +00:00
Edward Yang	de893e96c7	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-05 20:15:11 +00:00
PyTorch MergeBot	adae7f66aa	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `c37103234a`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
Edward Yang	c37103234a	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-04 19:43:17 +00:00
PyTorch MergeBot	b7dad7dd49	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `90b08643c3`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358))	2025-09-04 15:25:07 +00:00
Edward Yang	90b08643c3	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-03 07:33:55 +00:00
PyTorch MergeBot	4e42aa8ffc	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `b7034e9c92`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684))	2025-09-02 20:28:42 +00:00
Edward Yang	b7034e9c92	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-01 23:00:21 +00:00
Scott Wolchok	5d35b49ba7	Fix forced copying def_property_readonly for FunctionSchema & friends (#161301 ) This took me a bit to figure out and I'm pretty sure I've looked at this code before. Pybind uses `return_value_policy::reference_internal` for `def_property`, which [causes the owning object to be kept alive for the lifespan of the return value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html), allowing the getter to safely avoid copying the property value. However, lambdas act like they return `auto`, not `decltype(auto)`, so our lambdas themselves were forcing copies! Testing: observed std::vector<Argument> copying disappear in Linux perf profile of someOpInfo._schema.arguments/returns (in _python_dispatch.correct_storage_aliasing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/161301 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/wconstab	2025-08-30 06:55:42 +00:00
Scott Wolchok	67457dbb9d	Fix non-const reference arguments in torch/csrc/jit/python/init.cpp (#161300 ) Shouldn't be any generated code impact, just fixing bad practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161300 Approved by: https://github.com/wconstab, https://github.com/malfet ghstack dependencies: #161286	2025-08-29 19:01:32 +00:00
Nikita Shulga	d8cb3db533	Add unsigned support to `IValue` (#160102 ) - Moved repeated logic of saving int64/uint64 into a polymorphic container into `THPUtils_unpackInteger` - Added `TestPythonDispatch.test_dispatch_uint64` regression test Fixes https://github.com/pytorch/pytorch/issues/159168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160102 Approved by: https://github.com/ezyang	2025-08-11 03:57:18 +00:00
Yanan Cao (PyTorch)	731ee31f7b	[TorchScript, PT2] Add torch._check compatibility support (#159988 ) Summary: Add support for torch._check() in TorchScript jit.script frontend. * It will be special cased to behave like torch._assert, turned into an if + raise exception. Test Plan: Unit tests Rollback Plan: Differential Revision: D79744604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159988 Approved by: https://github.com/davidberard98	2025-08-08 23:14:13 +00:00
Xuehai Pan	541584d22e	[BE][8/16] fix typos in torch/ (torch/csrc/jit/) (#156318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156318 Approved by: https://github.com/albanD	2025-07-02 22:55:29 +00:00
Jason Ansel	0596323c35	Better fix for `__index__` SymInt issue (#157201 ) This improves on #156928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157201 Approved by: https://github.com/ezyang	2025-07-01 07:06:46 +00:00
Laith Sakka	74ebd8d14e	use guard_or_false for expand utils reduction (#155868 ) This is classic broadcast like pattern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155868 Approved by: https://github.com/bobrenjc93	2025-06-21 23:42:19 +00:00
Yidi Wu	545fbd58dc	[export] inline jit.scripted function in export (#155180 ) When we export a scripted function, we inline the original callable stored in "_torchdynamo_inline", this is the same strategy as torch.compile path. We do the same thing for script method, where a "\_\_wrapped\_\_" attribute points to the original callable in most cases. There are some corner cases we identified: top-level jit.scripted modules' method doesn't have a \_\_wrapped\_\_. In this case, we fall back to the original scripted approach. Maybe there're more such cases but need verification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155180 Approved by: https://github.com/zou3519	2025-06-10 20:34:12 +00:00
cyy	388912dd94	Remove AttributeError constructor (#154808 ) It is a private API and uses C vsnprintf, which is not type safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154808 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-06-03 03:49:09 +00:00
PyTorch MergeBot	ef92653022	Revert "Remove AttributeError constructor (#154808 )" This reverts commit `3239da0c73`. Reverted https://github.com/pytorch/pytorch/pull/154808 on behalf of https://github.com/cyyever due to Need format code ([comment](https://github.com/pytorch/pytorch/pull/154808#issuecomment-2933286113))	2025-06-03 03:40:41 +00:00
Yuanyuan Chen	3239da0c73	Remove AttributeError constructor (#154808 ) It is a private API and uses C vsnprintf, which is not type safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154808 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-06-03 02:18:51 +00:00
Aaron Gokaslan	0cd18ba1ca	[BE][Ez] Update deprecated pybind11 functions (#154798 ) Some checks failed pull / linux-jammy-py3.9-gcc11 (push) Has been cancelled Details pull / linux-docs (push) Has been cancelled Details pull / linux-jammy-py3.9-gcc11-no-ops (push) Has been cancelled Details pull / linux-jammy-py3.9-gcc11-pch (push) Has been cancelled Details pull / linux-jammy-py3.10-clang15-asan (push) Has been cancelled Details pull / linux-focal-py3.9-clang10-onnx (push) Has been cancelled Details pull / linux-focal-py3.9-clang10 (push) Has been cancelled Details pull / linux-focal-py3.13-clang10 (push) Has been cancelled Details pull / linux-focal-cuda12.6-py3.10-gcc11-build-distributed (push) Has been cancelled Details pull / linux-focal-cuda12.6-py3.10-gcc11-test (push) Has been cancelled Details pull / linux-focal-cuda12.6-py3.10-gcc11 (push) Has been cancelled Details pull / linux-jammy-py3-clang12-mobile-build (push) Has been cancelled Details pull / linux-jammy-cuda11.8-cudnn9-py3.9-clang12 (push) Has been cancelled Details pull / linux-focal-py3_9-clang9-xla (push) Has been cancelled Details pull / linux-focal-cpu-py3.10-gcc11-bazel-test (push) Has been cancelled Details pull / linux-jammy-py3.9-gcc11-mobile-lightweight-dispatch-build (push) Has been cancelled Details pull / linux-jammy-rocm-py3.10 (push) Has been cancelled Details pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 (push) Has been cancelled Details pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail (push) Has been cancelled Details pull / linux-jammy-py3-clang12-executorch (push) Has been cancelled Details pull / cuda12.8-py3.10-gcc9-sm75 (push) Has been cancelled Details pull / linux-jammy-xpu-2025.1-py3.9 (push) Has been cancelled Details inductor-unittest / cuda12.6-py3.10-gcc9-sm86 (push) Has been cancelled Details inductor-unittest / cuda12.6-py3.12-gcc9-sm86 (push) Has been cancelled Details inductor-unittest / linux-jammy-cpu-py3.12-gcc11-inductor-halide (push) Has been cancelled Details inductor-unittest / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu (push) Has been cancelled Details inductor-unittest / linux-jammy-cpu-py3.9-gcc11-inductor (push) Has been cancelled Details inductor-unittest / cuda12.6-py3.13-gcc9-sm86 (push) Has been cancelled Details ossf-scorecard / Scorecards analysis (push) Has been cancelled Details Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled Details * getType() is deprecated, replace it with new/proper static method. These are backwards compatible with old pybind11 versions we support. So break this off before we upgrade to pybind11 3.0 where these methods are dropped in #154115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154798 Approved by: https://github.com/jansel, https://github.com/cyyever	2025-06-01 06:17:50 +00:00
David Berard	a237831bc2	[JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value> (#153645 ) Summary: TL;DR: make DCE faster by replacing a Set<Value> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set). Details The goal of this PR is to optimize this function from AliasDb: ``` bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const { const auto writtenTo = getWrites(n); if (writtenTo.empty()) { return false; } MemoryLocations locs; for (const auto v : vs) { auto it = elementMap_.find(v); if (it != elementMap_.end()) { const auto& vlocs = memoryDAG_->getMemoryLocations(it->second); if (writtenTo.intersects(vlocs)) { return true; } } } return false; } ``` In the DCE use case, we have a ValueSet of live values, into which we insert `Value`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`. Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change. Implementation: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations. Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're using AliasDb for analysis, and otherwise use a `Set<Value>` if we don't have AliasDb. Test Plan: Rely on unit tests. Differential Revision: D74827086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153645 Approved by: https://github.com/eellison	2025-05-19 21:04:59 +00:00
David Berard	5e6e52e7c9	[JIT] add GRAPH_DEBUG for setGraphExecutorOptimize (#153549 ) Summary: Optionally log when setGraphExecutorOptimize is called, so we can get insight into the GraphExecutor behavior. Differential Revision: D74692508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153549 Approved by: https://github.com/PaulZhang12, https://github.com/SamGinzburg	2025-05-14 20:07:25 +00:00
cyy	45efa1aaa8	[3/N] Use internal linkage in C++ files (#151297 ) Follows #151070. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151297 Approved by: https://github.com/Skylion007	2025-05-05 17:48:39 +00:00
cyy	41bd0c900a	[1/N] Deprecate c10::string_view and at::string (#151972 ) The calls of `c10::string_view` in the code base are replaced by `std::string_view`. The calls of `at::string` are replaced by `std::string` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151972 Approved by: https://github.com/malfet	2025-04-29 07:23:52 +00:00
cyy	70d7638b0d	Fix clang-tidy suppression in torch/csrc/jit (#152271 ) Remove some clang-tidy suppression in torch/csrc/jit by applying fixes or refactoring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152271 Approved by: https://github.com/Skylion007, https://github.com/malfet Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-04-27 21:18:39 +00:00
cyyever	24ca7e91e6	[1/N] Use internal linkage in torch/csrc C++ files. (#150930 ) Turn more functions and variables into static if they are not used outside the cpp files. Unused functions are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150930 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-04-11 02:19:31 +00:00
Yidi Wu	c714d2fc0e	[hop] support base_hop._gen_schema (#149688 ) This PR creates two utils for generating a schema for hops from example inputs and use base hop as an exmaple. 1. HopArgumentInfoGen creates an argument or an output schema with mutation information. 2. CFuncitonSchemaGen piece together the argument info of inputs and outputs and produces torch._C.FunctionSchema. is_write attribute of argument info can be computed. Note that the is_write annotation only works when the inputs are flattened (e.g. cannot support mutation inside tuple). We need special handling the case where we have tuple inputs like cond. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149688 Approved by: https://github.com/zou3519	2025-04-09 16:42:55 +00:00
cyy	79e8a69257	Enable move warnings for torch targets (#149923 ) This PR enables more move warnings for torch targets and fixes some code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149923 Approved by: https://github.com/malfet	2025-03-26 08:38:13 +00:00
Mikayla Gawarecki	be0ceee1c3	Make record/storage alignment in torch.save configurable (#147788 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147788 Approved by: https://github.com/albanD ghstack dependencies: #147786, #147787	2025-03-06 12:04:46 +00:00
cyy	9aa897b992	Remove unnecessary tensor clone (#148159 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148159 Approved by: https://github.com/Skylion007	2025-03-02 16:21:39 +00:00
Michael Suo	99dd846672	[torch] fix builds for older pybind (#146630 ) Summary: some versions of pybind we build with don't have `py::set_error`. So just use the underlying python C API. Test Plan: unit tests Differential Revision: D69254629 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146630 Approved by: https://github.com/colin2328, https://github.com/ngimel	2025-02-06 21:22:00 +00:00
Michael Suo	425804db2b	[torch] fix exception types in custom class magic setattr/getattr (#146516 ) Summary: `c10::AttributeError` is not automatically converted to Python AttributeError, it needs some special macros (e.g. `HANDLE_TH_ERRORS`). Some Python functions like `hasattr` rely on the type of the throw exception to be correct. We don't need the fully generality of those macros, so just do a targeted error type conversion here. Test Plan: added unit test Differential Revision: D69197217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146516 Approved by: https://github.com/zdevito	2025-02-06 02:14:11 +00:00
Mikayla Gawarecki	001e355a56	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## How does this work The format for the checkpoint is as such ``` archive_name/ \|_ data.pkl \|_.format_version \|_byteorder \|_data/ \|_ 0 \|_ 1 \|_ 2 \|_ ... \|_ ``` Each `data/i` record represents a storage, where storages are written in the order that the Pickler encounters them. For each storage, our `persistent_load` logic saves the following metadata to the pickle file `dtype, numel, key, location` where `numel` is the number of bytes in the storage. Note that we always use `miniz` writer in the zip64 mode per [here](`7796e308d0/caffe2/serialize/inline_container.cc (L701)`) A zipfile record written by miniz looks as such ``` ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ \| 30 byte header \| n byte filename \| zip64_extra_data \| m byte padding \| storage \| 16 or 24 byte local dir footer \| ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ ``` - The header size (30) is given by [`MZ_ZIP_LOCAL_DIR_HEADER_SIZE`](https://github.com/pytorch/pytorch/blob/main/third_party/miniz-3.0.2/miniz.c?fbclid=IwZXh0bgNhZW0CMTEAAR2O8Vysd--UoSCxW70gabXIS1dbz733oHwuUQ5_Ff1hY2WU6PL2i6CSH4A_aem_J9oaU2HpDeWtJKOU9EnVqw#L3290) - filename will be `"{archive_name}/{filepath}"` - `zip64_extra_data` is determined by [`mz_zip_writer_create_zip64_extra_data`](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6202)`). Note that [we only create zip64_extra_data if storage_size >= 0xFFFFFFFF or the offset of the start of the header >= 0xFFFFFFFF](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6519-L6524)`) - `m` is determined by [`getPadding`](`7796e308d0/caffe2/serialize/inline_container.cc (L254)`), which accounts for filename, zip64_extra_data to determine `m` such that the start of `storage` is aligned to 64 bytes. The `m` bytes will always start with `F B padding_size" as the first 4 bytes - The local dir footer size is determined based on [this snippet ](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6610-L6632)`): if the buffer size is 0 it is skipped. If the zip64_extra_data was created, it is 24, otherwise it is 16. When `torch.utils.serialization.config.load.calculate_storage_offsets` is set we do the following - We keep track of where the "cursor" is in the file using `current_offset`, after each persistent_load call, it will be at the offset where the header for the next record starts - for the 0th storage, "data/0", we use the regular get_record_offset to determine the start of the storage - for any other storage, (where the storages will be in order encountered by the unpickler, 0, 1, 2, 3, ...) we use `get_record_offset_no_read`, which re-uses the `getPadding` logic to determine the offset of the storage - Note that `load_tensor` will only ever be called again with the same key if the storage's `._data_ptr()` is 0 [[pointer1](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1917-L1918)][[pointer2](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1936-L1937)], so we cache the offsets for this edge case - After each storage, if the storage is non-zero, we account for the local dir footer based on the logic described above ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879	2025-01-31 17:09:20 +00:00
Manav Avlani	f9227e7c33	Expose ToIValueAllowNumbersAsTensors to TORCH_PYTHON_API so we can use it in monarch (#146087 ) Summary: TSIA Test Plan: Tested up the stack but existing unittests Reviewed By: suo Differential Revision: D68917233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146087 Approved by: https://github.com/suo	2025-01-31 05:08:11 +00:00
PyTorch MergeBot	9010649292	Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )" This reverts commit `db3685a35c`. Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))	2025-01-28 03:07:17 +00:00
Mikayla Gawarecki	db3685a35c	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879	2025-01-27 23:57:30 +00:00
c8ef	a989a0b13a	[NFC] Fix some minor typos. (#145599 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145599 Approved by: https://github.com/Skylion007	2025-01-24 18:58:59 +00:00

1 2 3 4 5 ...

907 Commits