pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	2a11ce2c78	Support calling torch.compile inside non-strict export (#164171 ) So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164171 Approved by: https://github.com/ydwu4	2025-10-03 16:31:07 +00:00
drisspg	3288fbf374	Change default device to current acclerator (#164399 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164399 Approved by: https://github.com/albanD	2025-10-03 16:15:09 +00:00
James Wu	fa5306b4f5	Support partial _DynamoCacheEntries when not all backends available (#163521 ) Differential Revision: [D82735769](https://our.internmc.facebook.com/intern/diff/D82735769/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163521 Approved by: https://github.com/zhxchen17	2025-10-03 16:14:32 +00:00
Jeff Daily	5656d45c8f	forward fix #164481 (#164578 ) PR #164481 added unit test test_scaled_mm_preserves_strides in test/inductor/test_fp8.py. It was missing the adjustment for ROCm's F8 types on MI300. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164578 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-03 15:44:34 +00:00
atalman	e40fe634b1	Pin conda version for Docker builds (#164575 ) Mitigates https://github.com/pytorch/pytorch/issues/164574 Remove unused CUDA_CHANNEL var - this was used before when we had pytorch install via conda. Please note: CUDA 13.0 failures are expected since the CI tries to build against prod and CUDA 13.0 is not available in prod yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164575 Approved by: https://github.com/malfet, https://github.com/Camyll	2025-10-03 15:01:35 +00:00
bobrenjc93	3db2164341	[torchfuzz] add norm operators (#164514 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164514 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434	2025-10-03 14:44:19 +00:00
bobrenjc93	5bb8f04d3e	[torchfuzz] add nn functional ops (#164434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164434 Approved by: https://github.com/pianpwk ghstack dependencies: #164432	2025-10-03 14:44:19 +00:00
Yuanyuan Chen	5743d731c1	Use torch.testing.test_close instead of torch.testing.test_allclose (#164539 ) Because torch.testing.test_allclose is deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164539 Approved by: https://github.com/mlazos	2025-10-03 14:39:10 +00:00
PyTorch UpdateBot	aed66248a0	[vllm hash update] update the pinned vllm hash (#164319 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164319 Approved by: https://github.com/pytorchbot Co-authored-by: Huy Do <huydhn@gmail.com>	2025-10-03 12:30:33 +00:00
Nicolas Macchioni	6c3c9414eb	config for dcache + unit tests (#164512 ) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste Differential Revision: D83714687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164512 Approved by: https://github.com/jananisriram	2025-10-03 10:52:59 +00:00
CWOA	eccf561326	Move call to output generated code in inductor (#161615 ) This PR moves the call to copy the generated code from `/tmp/...` so that it is still called if attempting to compile the generated code fails. In both cases now, the generated code will be copied across to `torch_compile_debug/run_.../torchinductor/output_code.py` which makes debugging bad generated code easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161615 Approved by: https://github.com/eellison	2025-10-03 10:23:22 +00:00
jainapurva	ddf8de28c2	Add Rocm to Operator Microbenchmark CI (#164173 ) This pull request adds support for running operator microbenchmarks on ROCm (AMD GPU) environments in the CI workflow. The main changes involve introducing new build and test jobs for ROCm in the `.github/workflows/operator_microbenchmark.yml` file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164173 Approved by: https://github.com/huydhn	2025-10-03 07:35:32 +00:00
bobrenjc93	7617b113ad	[torchfuzz] Support EagerVsFullGraphDynamicCompileWithNumericsCheck (#164432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164432 Approved by: https://github.com/pianpwk	2025-10-03 06:42:20 +00:00
fduwjj	2a760dc51e	[DeviceMesh] Simplifying internal bookkeeping with CuTe layout (#163213 ) We want to refactor the internal bookkeeping of DeviceMesh so that: Simply the bookkeeping logics and make it generic enough so that it is easy to support new transformations like flatten noncontiguous dim, reshape and unflatten. (We leveraged the CuTe layout). This new layout also let us handle non-contiguous slicing, flatten, transpose possible. Concretely, in this PR, we do the following: 1. Use the `_MeshLayout` to handle all index operations rather use a map to record mesh dims. 2. Removed `flatten_name_to_root_dims`, because now we can directly get layout from a flattened device mesh. 3. Replaced `_get_slice_mesh_dims` with `_get_slice_mesh_layout`. 4. Use the newly added function `check_overlap` to check layout overlap. 5. Use a new function `to_remapping_tensor` to use layout ranks as indices when the mesh tensor is not representable as CuTe. The reason is that layout acts as a backend of mesh tensor bookkeeping (indexing indices), it needs to be used as indices for remap back to the mesh tensor for new DeviceMesh generation and backend init. For example, in the case of 2K to 4K, the underlying layout is (2K, 1) but the actual value of the mesh tensor is [2K, 2K+1, ....,]. While flattening, slicing, we need to remap the layout back to the new mesh tensor so it maps the actual device allocation. For example, in the 2K to 4K case, if the shape is (1K, 1K) with dim_names ("dp", "tp"). Then when slicing "tp", the mesh tensor should be (2K, 2K+1, ..., 3K-1) or (3K, 3K+1, ... 4K-1). not the global ranks generated from the layout. (1K, 1). Verified that loss curve is very close for DeepSeekV3 on torchtitan, note that exact same match is challenging because even if we run the baseline twice, the loss curve does not exactly match. <img width="1113" height="490" alt="image" src="https://github.com/user-attachments/assets/7877b5a4-337e-4ad8-b878-2378f4f0f38d" /> The PR looks big indeed but we don't change any existing behavior of DeviceMesh, so it is a pure refactor. With this refactoring we also enabled the slicing and flatten of non-contiguous dims of a device mesh which is hard to implement without cute layout. This is a continue of https://github.com/pytorch/pytorch/pull/161106 (original one got messed with EasyCLA) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163213 Approved by: https://github.com/lw, https://github.com/fegin	2025-10-03 05:51:28 +00:00
Henry Tsang	6c209bfc5c	[cutlass-4][take 2] upgrade to cutlass 4.2.1 (#164159 ) Test Plan: Sandcastle Differential Revision: D83492704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164159 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-10-03 03:47:59 +00:00
Maggie Moss	1051c1de5c	Add pyrefly suppressions 2/n (#164513 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check --- step 1: uncomment lines in the `pyrefly.toml` file before: https://gist.github.com/maggiemoss/911b4d0bc88bf8cf3ab91f67184e9d46 after: ``` INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml` INFO 0 errors (1,152 ignored) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164513 Approved by: https://github.com/oulgen	2025-10-03 02:46:13 +00:00
Ke Wen	d1cbb74fb1	multimem reduce (#164517 ) Modified `multimem_one_shot_all_reduce_out` function to accept a `root` argument, making it a `multimem_reduce` op. The original `multimem_one_shot_all_reduce` op becomes a caller of the `multimem_reduce`, with each rank providing its own rank id as root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164517 Approved by: https://github.com/ngimel	2025-10-03 02:41:10 +00:00
Markus Hoehnerbach	91c4db76cb	fix flex attention eager: dont round down scores to low-precision (closes #163588 ) (#163986 ) Fixes: https://github.com/pytorch/pytorch/issues/163588 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163986 Approved by: https://github.com/drisspg, https://github.com/mlazos	2025-10-03 01:09:59 +00:00
eellison	4691fe6070	remove unnecessary registration (#164481 ) scaled_mm already had `needs_exact_strides` in its op registration. also added a test showing these strides are being respected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164481 Approved by: https://github.com/drisspg, https://github.com/mlazos	2025-10-03 01:03:12 +00:00
Kurt Mohler	ef50c6e3e3	[MPS] Add backward pass for `embedding_bag` (#163931 ) Fixes #162270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163931 Approved by: https://github.com/malfet	2025-10-03 00:48:38 +00:00
eellison	86474ce996	Update mask dtype (#164472 ) Differential Revision: [D83781684](https://our.internmc.facebook.com/intern/diff/D83781684) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164472 Approved by: https://github.com/bdhirsh	2025-10-03 00:19:36 +00:00
Yuanyuan Chen	18e18488e8	[6/N] Apply ruff UP035 rule (#164438 ) Continued code migration to enable ruff UP035. Most changes are about moving `Callable` from typing to from collections.abc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164438 Approved by: https://github.com/ezyang	2025-10-03 00:15:32 +00:00
Eddie Yan	f7082e92b3	[cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks (#161749 ) Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination https://github.com/pytorch/pytorch/pull/85447 According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility Planning to add an explicit determinism test as well... Pull Request resolved: https://github.com/pytorch/pytorch/pull/161749 Approved by: https://github.com/ngimel	2025-10-03 00:09:47 +00:00
Yang Wang	95a053284c	Fix vllm build issue (#164361 ) Fixes #ISSUE_NUMBER unstable https://github.com/pytorch/pytorch/issues/164362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164361 Approved by: https://github.com/huydhn Co-authored-by: Huy Do <huydhn@gmail.com>	2025-10-02 23:34:21 +00:00
Jagadish Krishnamoorthy	c7e30ae4dd	MX: Remove redundant PLATFORM_SUPPORTS_MX_GEMM constant (#164320 ) Deleted duplicate definition of PLATFORM_SUPPORTS_MX_GEMM, was introduced in https://github.com/pytorch/pytorch/pull/162209 Also, adjusted BLOCK_SIZE and fp4_scaling_dtype in test_matmul_cuda.py to enable test_blockwise_nvfp4_compile on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164320 Approved by: https://github.com/jeffdaily	2025-10-02 23:30:56 +00:00
soulitzer	dca73982c5	Support setting grad_dtype on leaf tensors (#162815 ) `grad_dtype` is a new attribute on Tensor to control gradient dtype: - Access/setting is leaf-only. - grad_dtype is respected when (1) when assigning to .grad, and (2) in the engine after the previous node produces incoming gradients for AccumulateGrad. (See table below for details) - Not setting grad_dtype preserves the current behavior. Accessing it returns `t.dtype` - `grad_dtype` cannot be set when there is already a `.grad` present and the dtypes conflict. \| `grad_dtype` setting \| Setting `.grad` manually \| Incoming gradient from autograd engine \| \|-----------------------\|--------------------------\|-----------------------------------------\| \| Default (tensor’s dtype) \| `.grad` must match tensor’s dtype \| Engine casts incoming grad to tensor’s dtype \| \| Set to specific dtype \| `.grad` must match that dtype \| Engine casts incoming grad to the specified dtype \| \| Set to `None` \| `.grad` may be any dtype \| Engine does not cast; accepts incoming grad dtype as-is \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/162815 Approved by: https://github.com/albanD	2025-10-02 23:09:07 +00:00
Nan Zhang	43848b71d9	Improved support for autotuning in wrapper_fxir (#164132 ) Summary: - correct dtype propagation - allow more more options to be passed to compiler Test Plan: in follow up change Differential Revision: D83367909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164132 Approved by: https://github.com/jansel	2025-10-02 22:54:22 +00:00
Laith Sakka	15c8bdcc5e	Fix FloorDiv should not generate non integer rationals (due to sympy bug) (#164398 ) FloorDiv eval have this optimization ``` # Expands (x + y) // b into x // b + y // b. # This only works if floor is an identity, i.e. x / b is an integer. ``` Before this PR this optimization would generate a result in an expression like this. Duo to a bug in sympy. ``` Mul(Rational(1, 22), Add(Mul(Integer(24), Symbol('s37', integer=True, positive=True)), Integer(672)), FloorDiv(Mul(Symbol('s14', integer=True, positive=True), Symbol('s46', integer=True, positive=True)), Integer(2016))) ``` This is because in sympy an expression can have .is_integer =True yet have 1/22 in it! This PR ensure we do not generate that by simply opting out if this optimization if we end up with quotient that have such rational. Fix https://github.com/pytorch/pytorch/issues/164385, https://github.com/pytorch/pytorch/issues/154996 https://github.com/pytorch/pytorch/issues/153375 https://github.com/pytorch/pytorch/issues/164063 and internal user issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164398 Approved by: https://github.com/jansel, https://github.com/isuruf	2025-10-02 22:51:03 +00:00
PyTorch MergeBot	22e219d996	Revert "[DeviceMesh] Simplifying internal bookkeeping with CuTe layout (#163213 )" This reverts commit `b0985144b5`. Reverted https://github.com/pytorch/pytorch/pull/163213 on behalf of https://github.com/yangw-dev due to caused internal test failure ([comment](https://github.com/pytorch/pytorch/pull/163213#issuecomment-3363414435))	2025-10-02 22:22:26 +00:00
Anthony Barbier	bdc0a421d7	Stop parsing command line arguments every time common_utils is imported. (#156703 ) Last PR in the series to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs: https://github.com/pytorch/pytorch/pull/154612 https://github.com/pytorch/pytorch/pull/154628 https://github.com/pytorch/pytorch/pull/154715 https://github.com/pytorch/pytorch/pull/154716 https://github.com/pytorch/pytorch/pull/154725 https://github.com/pytorch/pytorch/pull/154728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156703 Approved by: https://github.com/clee2000	2025-10-02 22:22:04 +00:00
ankushwahaRH	ece5e0f01b	Fake process group Direct construction error (#163665 ) Fixes #162129. Added validation in _rank_not_in_group() to check if ```FakeProcessGroup``` is properly initialized before use, raising a clear error message if ```torch.distributed.init_process_group(backend='fake')``` hasn't been called first. This prevents silent failures and ensures proper dispatch system integration for all distributed operations. Added test case test_fake_process_group_direct_usage_error() that validates the error is raised for ```all_reduce``` and ```all_to_all_single``` operations. Please let me know if additional distributed operators should be tested or if any other updates are needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163665 Approved by: https://github.com/ezyang	2025-10-02 22:19:26 +00:00
PyTorch MergeBot	a34797e031	Revert "Add provenance to inductor IR nodes created after graph.run (#164255 )" This reverts commit `b9e73e639e`. Reverted https://github.com/pytorch/pytorch/pull/164255 on behalf of https://github.com/jeffdaily due to broke rocm; inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_deferred_triton_kernels [GH job link](https://github.com/pytorch/pytorch/actions/runs/18200790301/job/51821738132) [HUD commit link](`b9e73e639e`) ([comment](https://github.com/pytorch/pytorch/pull/164255#issuecomment-3363360088))	2025-10-02 22:01:41 +00:00
Isuru Fernando	f465ea6752	[inductor] require shape in TritonCSEVariable (#162275 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162275 Approved by: https://github.com/mlazos ghstack dependencies: #164158	2025-10-02 21:52:09 +00:00
Isuru Fernando	a8edccfbf4	[inductor] fix TestTemplateRender in select_algorithm (#164158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164158 Approved by: https://github.com/mlazos	2025-10-02 21:52:09 +00:00
Rohit Singh Rathaur	6389658ec6	Fix type hints in PrepareModuleInput and PrepareModuleInputOutput (#164482 ) Fixes #161646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164482 Approved by: https://github.com/Skylion007	2025-10-02 21:40:43 +00:00
Xilun Wu	cc71ab86a6	[DTensor] raise error if the local_tensor argument passed to DTensor.from_local is a DTensor (#164496 ) Summary Raise error when the `local_tensor` argument passed to `DTensor.from_local` is a DTensor, this prevents users from accidentally calling `from_local` over a DTensor object. The error message is organized in this way: ``` the local_tensor argument only accepts torch.Tensor but got <class 'torch.distributed.tensor.DTensor'> value. ``` Test `pytest test/distributed/tensor/test_dtensor.py -k test_from_local` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164496 Approved by: https://github.com/ezyang	2025-10-02 21:25:01 +00:00
PyTorch MergeBot	2a7c486750	Revert "Speed up FP precision lookup (#164044 )" This reverts commit `723ba21393`. Reverted https://github.com/pytorch/pytorch/pull/164044 on behalf of https://github.com/yangw-dev due to broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1: xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow] 502 \| return std::hash<size_t>{}((k1 << 32) \| k2); ([comment](https://github.com/pytorch/pytorch/pull/164044#issuecomment-3363016702))	2025-10-02 21:00:44 +00:00
Maggie Moss	5f18f240de	Add initial suppressions for pyrefly (#164177 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: `python3 scripts/lintrunner.py` `pyrefly check` --- Pyrefly check before: https://gist.github.com/maggiemoss/3a0aa0b6cdda0e449cd5743d5fce2c60 After: ``` INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml` INFO 0 errors (1,063 ignored) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164177 Approved by: https://github.com/Lucaskabela	2025-10-02 20:57:41 +00:00
Jeff Daily	6b7970192f	[ROCm][CI] fix test_cudnn_convolution_relu_cuda (#164466 ) Fixes #162816. Test was comparing output of conv vs fused conv but inputs were different memory formats. Also fix test_cudnn_convolution_add_relu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164466 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-02 20:36:54 +00:00
Yuanyuan Chen	115af42e9d	Fix readibility checks in TIDY and apply them (#164475 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/164475 Approved by: https://github.com/albanD, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-10-02 20:34:49 +00:00
Yu, Guangye	5f775bdfb7	Fix THP_PyObject_VirtualFree return type (#163763 ) # Motivation `void THP_PyObject_VirtualFree` should have no return value; otherwise, it would raise a build warning ```bash C:\Users\guangyey\pytorch\torch\csrc\dynamo\cpython_defs.c(264): warning C4098: 'THP_PyObject_VirtualFree': 'void' function returning a value ``` # Additional Context Refer to `c4f21d7c7c/Include/cpython/objimpl.h (L59-L68)` PyObjectArenaAllocator::free is defined with `void` return type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163763 Approved by: https://github.com/albanD, https://github.com/williamwen42	2025-10-02 20:21:53 +00:00
bobrenjc93	8c54101933	add tensor subclass printing support in fx/graph.py (#164403 ) it was previously quite misleading since it looks like the inputs to the dynamo graph are plain tensors when in reality they are tensor subclasses before ``` class GraphModule(torch.nn.Module): def forward(self, L_input_batch_inputs_: "i64[2, 512][512, 1]cuda:0", L_self_parameters_weight_: "f32[202048, 256][256, 1]cuda:0"): ``` after ``` class GraphModule(torch.nn.Module): def forward(self, L_input_batch_inputs_: "DTensor(i64[2, 512][512, 1]cuda:0)", L_self_parameters_weight_: "DTensor(f32[202048, 256][256, 1]cuda:0)"): ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164403 Approved by: https://github.com/ezyang	2025-10-02 20:06:12 +00:00
RajeshvShiyal	c45d56dd00	typo corrected in ivalue.cpp's comment (#164485 ) Fixes #164483 typo corrected in ivalue.cpp's comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164485 Approved by: https://github.com/Skylion007	2025-10-02 20:01:17 +00:00
Yuanyuan Chen	33b17bc619	Remove old CUDA version checks (#164199 ) Remove some version check code for CUDA <12. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164199 Approved by: https://github.com/ezyang	2025-10-02 19:55:47 +00:00
Jian Wen	22b1710252	Use posix_fallocate() to reserve disk space for shared memory (#161910 ) Shared memory is allocated by creating a file in /dev/shm (by default) that can run out of space. Pytorch reserves the file size by calling ftruncate() that creates a sparse file, so it succeeds even if sufficient disk space is not available. This could lead to a situation when a shared memory region is successfully created but a subsequent access to a shared memory page results in SIGBUS due to the disk being full. Using posix_fallocate() instead of ftruncate() eliminates this problem because the former syscall always allocates space and it returns an error if the disk is full. Related to https://github.com/pytorch/pytorch/issues/5040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161910 Approved by: https://github.com/mikaylagawarecki	2025-10-02 19:12:57 +00:00
Tugsbayasgalan Manlaibaatar	4661200125	[RELAND v2] Close some sources of fake tensors (#164372 ) Changelog: 1. When we run into an operation we didn't proxy, we end up emitting fake constants. We error under a config and we disable the config for some internal users. The reason we want to error is this signals a coverage problem we need to address but at the same time, we don't wnat to be disruptive to already working flows. 2. Previous attribute mutation detection logic in non-strict didn't account for nested module structure. This fixes silent incorrectness issue of exporting esm and qwen in non-strict and some torchbench models like levit_128 and demucs. 3. Previous logic didn't work on the cases where we mutate a container attribute as the previous approach used to pytree over old and new attributes resulting in length mismatch. We gracefully handle this now. Differential Revision: [D83673054](https://our.internmc.facebook.com/intern/diff/D83673054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164372 Approved by: https://github.com/avikchaudhuri	2025-10-02 18:58:52 +00:00
adabeyta	6a31f42da4	Fix NestedTensor max/min operations for integer dtypes. (#162273 ) Fixes: https://github.com/pytorch/pytorch/issues/162049 ### Summary max_dim and min_dim functions incorrectly used torch.finfo() for all dtypes, causing TypeError for integer tensors. ### Changes - Use torch.iinfo() for integer dtypes instead of torch.finfo(). - Add CPU test: `test_jagged_max_min_dtypes` covering `int8, int16, int32, int64, uint8, float16, bfloat16, float32 and float64` ### Testing Before Fix: `python -m pytest test/test_nestedtensor.py -k "test_jagged_max_min_dtypes" -v` Output: ``` FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_bfloat16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float32 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float64 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0005s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int32 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0005s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int64 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0004s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int8 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0004s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_uint8 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' ``` After Fix: `python -m pytest test/test_nestedtensor.py -k "test_jagged_max_min_dtypes" -v` Output: ``` Running 9 items in this shard test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_bfloat16 PASSED [0.0086s] [ 11%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float16 PASSED [0.0011s] [ 22%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float32 PASSED [0.0011s] [ 33%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float64 PASSED [0.0011s] [ 44%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int16 PASSED [0.0009s] [ 55%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int32 PASSED [0.0010s] [ 66%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int64 PASSED [0.0010s] [ 77%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int8 PASSED [0.0010s] [ 88%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_uint8 PASSED [0.0011s] [100%] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162273 Approved by: https://github.com/Skylion007, https://github.com/jbschlosser	2025-10-02 18:46:27 +00:00
Aidyn-A	c6a6c80a73	Add Aidyn-A to CUDA codeowners (#164436 ) Adding myself to "CUDA and CUDA math libraries" section. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164436 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy	2025-10-02 18:34:10 +00:00
Shangdi Yu	bf717ce346	[AOTI win] Add ABI stable method for updating constant buffer (#163819 ) Add `struct AOTInductorConstantMapEntry` to represent the constant map in AOTI Model. We cannot use `std::unordered_map` for cross-compilation, because it is not ABI stable. it will be tested when we test `update_user_managed_constant_buffer` for windows cross-compilation Example usage: ``` // Load constants. Create random constants here. auto* fc1_w = new slim::SlimTensor(slim::empty({16, 10}, c10::kFloat, c10::Device(c10::kCUDA, 0))); fc1_w->fill_(1.0); ..... // Build pairs std::vector<AOTInductorConstantPair> constants{ {"fc1_weight", fc1_w}, {"fc1_bias", fc1_b}, {"fc2_weight", fc2_w}, {"fc2_bias", fc2_b}, }; // Call runtime (pass raw pointer + size) update_user_managed_constant_buffer_abi( container_handle, constants.data(), constants.size(), /use_inactive=/false, /validate_full_update=/true); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163819 Approved by: https://github.com/desertfire	2025-10-02 18:31:00 +00:00
PyTorch MergeBot	f6f7676756	Revert "C++-accessible Placements via pybind11 (#163030 )" This reverts commit `3e03deab6f`. Reverted https://github.com/pytorch/pytorch/pull/163030 on behalf of https://github.com/swolchok due to doesn't pass pyre ([comment](https://github.com/pytorch/pytorch/pull/163030#issuecomment-3362450379))	2025-10-02 18:25:24 +00:00

... 3 4 5 6 7 ...

94130 Commits