pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Lakshay Garg	a4110fedcf	Use insert_or_assign instead of erase+emplace (#164868 ) insert_or_assign does effectively the same thing as erase+emplace but more efficiently since the search does not need to be repeated Pull Request resolved: https://github.com/pytorch/pytorch/pull/164868 Approved by: https://github.com/eqy	2025-10-08 19:13:49 +00:00
Natalia Gimelshein	37c6087334	Add split-K control to cuBLAS reduced-precision settings (#164766 ) ## Summary - add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options - extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls - update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations ## Testing - python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v (fails: ModuleNotFoundError: No module named 'psutil') ------ https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da Pull Request resolved: https://github.com/pytorch/pytorch/pull/164766 Approved by: https://github.com/malfet, https://github.com/albanD	2025-10-08 18:48:45 +00:00
Ke Wen	5c827a4133	[SymmMem] Multi-root tile reduction (#164757 ) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): Perform multiple tile reductions concurrently, with each tile reduced to a separate root. - The number of concurrent reductions can be smaller than world size, i.e. roots can be a subset of all ranks. But all ranks are still required to call into this API. - Currently supports NVLink SHARP scope only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164757 Approved by: https://github.com/weifengpy, https://github.com/fegin ghstack dependencies: #162243	2025-10-08 17:28:00 +00:00
Sean McGovern	f332017294	C++ API handle optimizer defaults (#161825 ) Fixes #141884 This fixes the issue for all optimizers and parameter options. A member function `overwrite_from` is added to the optimizer base class. Each optimizer then implements this function for comparing their accepted parameters to defaults. A SFINAE approach to handle the different optimizer parameters generically (in optimizer.h only) was evaluated, but I think this is easier to review and maintain. This mirrors the Python API up to one edge case. An example of the edge case is provided below. Python can distinguish between 1) Key not present in dict = "not specified" and 2) Key present in dict = "explicitly set". The C++ implementation cannot. The issue hinges on whether or not to track if a particular parameter was set by the user explicitly or not (discrepancy in the case when the constructor default is explicitly passed in). To track this seems like it will take more intervention than would be worth it (modify TORCH_ARG to keep track, use std::optional for the parameter types, use bitset tracking) and was not pursued in the current PR. I'm happy to alter the design if appropriate. ### Example of edge case hinging on CONSTRUCTOR DEFAULTS vs OPTIMIZER DEFAULTS 1. CONSTRUCTOR DEFAULTS: These are the values you get when calling AdamOptions() AdamOptions().lr() = 0.001 AdamOptions().weight_decay() = 0 AdamOptions().eps() = 1e-08 2. OPTIMIZER DEFAULTS: These are the values the user chose when creating the optimizer User's optimizer defaults: optimizer.lr() = 0.005 optimizer.weight_decay() = 0.1 optimizer.eps() = 1e-07 3. THE PROBLEM SCENARIO: User wants to add a parameter group with explicit weight_decay=0.0 User sets: weight_decay(0) 4. THE CONFUSION: Constructor default weight_decay: 0 User's explicit weight_decay: 0 Are they equal? YES Since they're equal, our overwrite_from() logic thinks: "User didn't set weight_decay explicitly, use optimizer default" 5. CURRENT BEHAVIOR: Final weight_decay: 0.1 User expected: 0 Match? ❌ NO === KEY INSIGHT === Constructor defaults are built into the C++ class definition. Optimizer defaults are chosen by the user at runtime. We want to respect the user intention. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161825 Approved by: https://github.com/janeyx99	2025-10-08 16:40:45 +00:00
angelayi	56ef7743fc	[opaque_obj] Add __eq__ and __deepcopy__ (#163279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163279 Approved by: https://github.com/zou3519	2025-10-08 09:09:16 +00:00
Yuanyuan Chen	64108bdbed	[BC-Breaking] Remove long-deprecated casting functions from native_functions.yaml (#164641 ) This PR removes `torch._cast_XXX` from generated OPs. They were deprecated in PyTorch 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164641 Approved by: https://github.com/albanD, https://github.com/justinchuby	2025-10-08 08:27:58 +00:00
Jane Xu	6a09f9306c	Fix #164742 , all header-impl'd userfacing functions should be inline (#164871 ) It is as @mxmpl pointed out; we are missing an inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164871 Approved by: https://github.com/mikaylagawarecki	2025-10-08 05:57:19 +00:00
Ke Wen	19bf67be32	multimem reduce (#164517 ) Modified `multimem_one_shot_all_reduce_out` function to accept a `root` argument, making it a `multimem_reduce` op. The original `multimem_one_shot_all_reduce` op becomes a caller of the `multimem_reduce`, with each rank providing its own rank id as root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164517 Approved by: https://github.com/ngimel	2025-10-08 05:25:16 +00:00
Ke Wen	d444384003	[SymmMem] Tiled reduce (#162243 ) Added op: `tile_reduce(Tensor input, Tensor(a!) out, int root, str group_name)` For now supports only: - NVSHMEM backed symmetric tensor; - 2D tensor and tile; - torch.float. Testing on right-bottom quandrant: ``` rank 0: tensor([[0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.]], device='cuda:0') PASSED ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162243 Approved by: https://github.com/ngimel	2025-10-08 02:03:04 +00:00
PyTorch MergeBot	97463d4cf3	Revert "Fix double dispatch to Python for detach (#163671 )" This reverts commit `c32118dc3e`. Reverted https://github.com/pytorch/pytorch/pull/163671 on behalf of https://github.com/izaitsevfb due to breaks export tests ([comment](https://github.com/pytorch/pytorch/pull/163671#issuecomment-3379281422))	2025-10-08 01:46:45 +00:00
Scott Wolchok	c32118dc3e	Fix double dispatch to Python for detach (#163671 ) This fixes #71725. Differential Revision: [D83857880](https://our.internmc.facebook.com/intern/diff/D83857880) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163671 Approved by: https://github.com/ezyang, https://github.com/albanD	2025-10-07 23:34:37 +00:00
PyTorch MergeBot	1e42fde45e	Revert "[CUDA] Add experimental green context support for SM carveout (#159104 )" This reverts commit `746fe78ecd`. Reverted https://github.com/pytorch/pytorch/pull/159104 on behalf of https://github.com/malfet due to Breaks Windows CD build ([comment](https://github.com/pytorch/pytorch/pull/159104#issuecomment-3378675515))	2025-10-07 20:51:22 +00:00
PyTorch MergeBot	f505caa71b	Revert "multimem reduce (#164517 )" This reverts commit `d1cbb74fb1`. Reverted https://github.com/pytorch/pytorch/pull/164517 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164517#issuecomment-3378529654))	2025-10-07 20:12:38 +00:00
PyTorch MergeBot	df640df68a	Revert "Reapply "C++-accessible Placements via pybind11 (#163030 )" (#164519 )" This reverts commit `8c0bc879b9`. Reverted https://github.com/pytorch/pytorch/pull/164519 on behalf of https://github.com/malfet due to Still breaks internal workflows ([comment](https://github.com/pytorch/pytorch/pull/164519#issuecomment-3378469432))	2025-10-07 19:46:17 +00:00
Lakshay Garg	9ecd092bd9	Add python bindings for NCCL CTA policies (#164309 ) NCCLConfig can now be constructed with non-default [cta policies][1] ```python import torch from torch.distributed import ProcessGroupNCCL as nccl config = nccl.NCCLConfig() config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO # NCCL version >= 2.28 ``` [1]: https://docs.nvidia.com/deeplearning/nccl/archives/nccl_2283/user-guide/docs/api/flags.html#nccl-communicator-cta-policy-flags Pull Request resolved: https://github.com/pytorch/pytorch/pull/164309 Approved by: https://github.com/eqy	2025-10-07 18:16:20 +00:00
Mikayla Gawarecki	f37a6523ef	Move version.h to torch/headeronly (#164381 ) Differential Revision: [D83685392](https://our.internmc.facebook.com/intern/diff/D83685392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164381 Approved by: https://github.com/janeyx99	2025-10-07 17:47:30 +00:00
Yuanyuan Chen	ff5faa744a	Remove unused THPXXX macros (#164660 ) These macros are not used in OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164660 Approved by: https://github.com/albanD	2025-10-07 04:04:21 +00:00
albanD	56d66ac0d7	Make custom op alias check consistent (#164576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164576 Approved by: https://github.com/soulitzer	2025-10-07 02:05:09 +00:00
rraminen	49f7d8d19d	[ROCm] Fix test_cuda_synchronize failure on ROCm (#164735 ) This PR skips the hipify step of torch/csrc/jit/ir/ir.h to avoid a build-time error for the JIT cuda namespace. This fixes two skipped tests in test/jit/test_cuda.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164735 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-07 01:14:24 +00:00
Chris Leonard	e89d12bf5d	Numpy zerotensor handling (#164487 ) Fixes #89034 Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True. @ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487 Approved by: https://github.com/izaitsevfb	2025-10-07 00:34:14 +00:00
Yedidya Feldblum	d4752bc7f6	[caffe2] tweak Unpickler::readInstruction handling TUPLE (#164764 ) Summary: Creating the vector was a bit awkward. Use the natural iterator-pair constructor with move-iterators. Test Plan: CI. Reviewed By: dolpm Differential Revision: D83995108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164764 Approved by: https://github.com/drisspg	2025-10-07 00:18:10 +00:00
PyTorch MergeBot	1fc71d1b57	Revert "Numpy zerotensor handling (#164487 )" This reverts commit `f7ad6dbad6`. Reverted https://github.com/pytorch/pytorch/pull/164487 on behalf of https://github.com/malfet due to Did it break torchbench?, see `8c728e129d/1` ([comment](https://github.com/pytorch/pytorch/pull/164487#issuecomment-3374635051))	2025-10-06 23:32:12 +00:00
Scott Wolchok	8c0bc879b9	Reapply "C++-accessible Placements via pybind11 (#163030 )" (#164519 ) This makes Placement data representation available in C++ via pybind11. Reapply with fix for internal errors. D83788896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164519 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2025-10-06 23:19:14 +00:00
Eddie Yan	746fe78ecd	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel Co-authored-by: drisspg <drisspguessous@gmail.com>	2025-10-06 23:11:23 +00:00
Scott Wolchok	11f5f65686	Use PyObject_GetOptionalAttrString in PyObject_FastGetAttrString when available (#164624 ) Python 3.13 added PyObject_GetOptionalAttrString. I'm not 100% certain that it is strictly better than the old approach in all cases, but based on documentation/comments it seems to be meant for this type of use, and it's faster when I profile torchtitan training (which gets to the "check for the `__torch_function__` attr on some object" part of maybe_has_torch_function frequently enough to notice, but wastes a bunch of time generating exceptions that we then suppressed here). Pull Request resolved: https://github.com/pytorch/pytorch/pull/164624 Approved by: https://github.com/Skylion007	2025-10-06 18:26:09 +00:00
albanD	af32d16a71	Add pure view support in autograd Function (#164736 ) This is the same as https://github.com/pytorch/pytorch/pull/164467 But it needs to be co-deved due to internal insanity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164736 Approved by: https://github.com/soulitzer	2025-10-06 18:21:05 +00:00
Yuanyuan Chen	9fff8155c3	[2/N] Fix clang-tidy readability checks (#164652 ) This PR applies clang-tidy readability checks to jit sources and all headers in the code base. `readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652 Approved by: https://github.com/Skylion007	2025-10-06 01:06:01 +00:00
PyTorch MergeBot	331191ce4b	Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659 )" This reverts commit `29cbcbac42`. Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/izaitsevfb due to reverted internally, see [D83214133](https://www.internalfb.com/diff/D83214133) ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3369348172))	2025-10-05 21:39:57 +00:00
PyTorch MergeBot	2c5ed6e7c0	Revert "[2/N] Fix clang-tidy readability checks (#164652 )" This reverts commit `3c5ca685d6`. Reverted https://github.com/pytorch/pytorch/pull/164652 on behalf of https://github.com/izaitsevfb due to need to revert due to a conflict with revert of https://github.com/pytorch/pytorch/pull/162659 ([comment](https://github.com/pytorch/pytorch/pull/164652#issuecomment-3369346707))	2025-10-05 21:36:57 +00:00
Yuanyuan Chen	3c5ca685d6	[2/N] Fix clang-tidy readability checks (#164652 ) This PR applies clang-tidy readability checks to jit sources and all headers in the code base. `readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652 Approved by: https://github.com/Skylion007	2025-10-05 07:05:11 +00:00
William Wen	409aece3f9	[dynamo, 3.14] prevent StackRef compilation in 3.14 Windows (#164400 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164400 Approved by: https://github.com/Camyll, https://github.com/atalman	2025-10-04 18:38:08 +00:00
Chris Leonard	f7ad6dbad6	Numpy zerotensor handling (#164487 ) Fixes #89034 Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True. @ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487 Approved by: https://github.com/ngimel, https://github.com/albanD	2025-10-04 12:03:48 +00:00
PyTorch MergeBot	f46bb04dcc	Revert "Add pure view support in autograd Function (#164467 )" This reverts commit `10335ffb2c`. Reverted https://github.com/pytorch/pytorch/pull/164467 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164467#issuecomment-3368152304))	2025-10-04 11:42:46 +00:00
PyTorch MergeBot	6f6a919366	Revert "Make custom op alias check consistent (#164576 )" This reverts commit `e438db2546`. Reverted https://github.com/pytorch/pytorch/pull/164576 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164467#issuecomment-3368152304))	2025-10-04 11:42:45 +00:00
Yuanyuan Chen	5103ecc5d8	[1/N] Fix clang-tidy readability checks (#164561 ) Check all `.cpp` files except `jit` files for readability thoroughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164561 Approved by: https://github.com/Skylion007	2025-10-04 09:40:38 +00:00
soulitzer	7d570129e0	Fix custom autograd Function memory leak when saving mutated view (#164407 ) Fixes https://github.com/pytorch/pytorch/issues/160317 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164407 Approved by: https://github.com/albanD	2025-10-04 00:47:12 +00:00
PyTorch MergeBot	3ddf2018d0	Revert "Support setting grad_dtype on leaf tensors (#162815 )" This reverts commit `dca73982c5`. Reverted https://github.com/pytorch/pytorch/pull/162815 on behalf of https://github.com/yangw-dev due to break internal test D83850533, see more details below ([comment](https://github.com/pytorch/pytorch/pull/162815#issuecomment-3367498501))	2025-10-03 23:14:28 +00:00
albanD	e438db2546	Make custom op alias check consistent (#164576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164576 Approved by: https://github.com/soulitzer ghstack dependencies: #164467	2025-10-03 21:42:11 +00:00
albanD	10335ffb2c	Add pure view support in autograd Function (#164467 ) Fix https://github.com/pytorch/pytorch/issues/73604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164467 Approved by: https://github.com/ezyang, https://github.com/soulitzer	2025-10-03 21:42:11 +00:00
Lakshay Garg	f006aee601	Speed up FP precision lookup (#164044 ) This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function. Fixes #161822. The issue described in #163709 still persists. This is meant as a short term fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164044 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-03 21:35:20 +00:00
PyTorch MergeBot	8ec8c14ace	Revert "[CUDA] Add experimental green context support for SM carveout (#159104 )" This reverts commit `3c59351c6e`. Reverted https://github.com/pytorch/pytorch/pull/159104 on behalf of https://github.com/clee2000 due to failed lint, pyfmt not caught pyi file, I think they need special handling since theyre not in the changed files list? ([comment](https://github.com/pytorch/pytorch/pull/159104#issuecomment-3367077208))	2025-10-03 20:15:56 +00:00
Eddie Yan	3c59351c6e	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel Co-authored-by: drisspg <drisspguessous@gmail.com>	2025-10-03 18:59:12 +00:00
Ke Wen	d1cbb74fb1	multimem reduce (#164517 ) Modified `multimem_one_shot_all_reduce_out` function to accept a `root` argument, making it a `multimem_reduce` op. The original `multimem_one_shot_all_reduce` op becomes a caller of the `multimem_reduce`, with each rank providing its own rank id as root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164517 Approved by: https://github.com/ngimel	2025-10-03 02:41:10 +00:00
Kurt Mohler	ef50c6e3e3	[MPS] Add backward pass for `embedding_bag` (#163931 ) Fixes #162270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163931 Approved by: https://github.com/malfet	2025-10-03 00:48:38 +00:00
soulitzer	dca73982c5	Support setting grad_dtype on leaf tensors (#162815 ) `grad_dtype` is a new attribute on Tensor to control gradient dtype: - Access/setting is leaf-only. - grad_dtype is respected when (1) when assigning to .grad, and (2) in the engine after the previous node produces incoming gradients for AccumulateGrad. (See table below for details) - Not setting grad_dtype preserves the current behavior. Accessing it returns `t.dtype` - `grad_dtype` cannot be set when there is already a `.grad` present and the dtypes conflict. \| `grad_dtype` setting \| Setting `.grad` manually \| Incoming gradient from autograd engine \| \|-----------------------\|--------------------------\|-----------------------------------------\| \| Default (tensor’s dtype) \| `.grad` must match tensor’s dtype \| Engine casts incoming grad to tensor’s dtype \| \| Set to specific dtype \| `.grad` must match that dtype \| Engine casts incoming grad to the specified dtype \| \| Set to `None` \| `.grad` may be any dtype \| Engine does not cast; accepts incoming grad dtype as-is \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/162815 Approved by: https://github.com/albanD	2025-10-02 23:09:07 +00:00
ankushwahaRH	ece5e0f01b	Fake process group Direct construction error (#163665 ) Fixes #162129. Added validation in _rank_not_in_group() to check if ```FakeProcessGroup``` is properly initialized before use, raising a clear error message if ```torch.distributed.init_process_group(backend='fake')``` hasn't been called first. This prevents silent failures and ensures proper dispatch system integration for all distributed operations. Added test case test_fake_process_group_direct_usage_error() that validates the error is raised for ```all_reduce``` and ```all_to_all_single``` operations. Please let me know if additional distributed operators should be tested or if any other updates are needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163665 Approved by: https://github.com/ezyang	2025-10-02 22:19:26 +00:00
PyTorch MergeBot	2a7c486750	Revert "Speed up FP precision lookup (#164044 )" This reverts commit `723ba21393`. Reverted https://github.com/pytorch/pytorch/pull/164044 on behalf of https://github.com/yangw-dev due to broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1: xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow] 502 \| return std::hash<size_t>{}((k1 << 32) \| k2); ([comment](https://github.com/pytorch/pytorch/pull/164044#issuecomment-3363016702))	2025-10-02 21:00:44 +00:00
Yuanyuan Chen	115af42e9d	Fix readibility checks in TIDY and apply them (#164475 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/164475 Approved by: https://github.com/albanD, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-10-02 20:34:49 +00:00
Yu, Guangye	5f775bdfb7	Fix THP_PyObject_VirtualFree return type (#163763 ) # Motivation `void THP_PyObject_VirtualFree` should have no return value; otherwise, it would raise a build warning ```bash C:\Users\guangyey\pytorch\torch\csrc\dynamo\cpython_defs.c(264): warning C4098: 'THP_PyObject_VirtualFree': 'void' function returning a value ``` # Additional Context Refer to `c4f21d7c7c/Include/cpython/objimpl.h (L59-L68)` PyObjectArenaAllocator::free is defined with `void` return type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163763 Approved by: https://github.com/albanD, https://github.com/williamwen42	2025-10-02 20:21:53 +00:00
Shangdi Yu	bf717ce346	[AOTI win] Add ABI stable method for updating constant buffer (#163819 ) Add `struct AOTInductorConstantMapEntry` to represent the constant map in AOTI Model. We cannot use `std::unordered_map` for cross-compilation, because it is not ABI stable. it will be tested when we test `update_user_managed_constant_buffer` for windows cross-compilation Example usage: ``` // Load constants. Create random constants here. auto* fc1_w = new slim::SlimTensor(slim::empty({16, 10}, c10::kFloat, c10::Device(c10::kCUDA, 0))); fc1_w->fill_(1.0); ..... // Build pairs std::vector<AOTInductorConstantPair> constants{ {"fc1_weight", fc1_w}, {"fc1_bias", fc1_b}, {"fc2_weight", fc2_w}, {"fc2_bias", fc2_b}, }; // Call runtime (pass raw pointer + size) update_user_managed_constant_buffer_abi( container_handle, constants.data(), constants.size(), /use_inactive=/false, /validate_full_update=/true); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163819 Approved by: https://github.com/desertfire	2025-10-02 18:31:00 +00:00

1 2 3 4 5 ...

16313 Commits