pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
henrylhtsang	0633f63f0d	[cutlass backend] try fix standlone runner test (#147811 ) Differential Revision: [D70147859](https://our.internmc.facebook.com/intern/diff/D70147859/) Trying to fix this test one last time, especially when mixed mm is getting removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147811 Approved by: https://github.com/chenyang78	2025-02-25 23:27:02 +00:00
Robert Hardwick	810d2a3dbd	[ARM] Fix bug in _ref_test_helper in test_ops and fix failing test on Aarch64 (#146597 ) We have a failing unit test on Aarch64 ``` Exception: Caused by reference input at index 34: SampleInput(input=Tensor[size=(5, 5, 4), device="cpu", dtype=torch.complex64, contiguous=False], args=(), kwargs={}, broadcasts_input=False, name='') To execute this test, run the following from the base repo dir: PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=34 python test/test_ops.py TestCommonCPU.test_python_ref__refs_square_cpu_complex64 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` After debugging it I found that `ex` variable is not being reset to None on each loop inside _ref_test_helper. Which after fixing, highlighted another expectedFailure to reenable - `nn.functional.hinge_embedding_loss` which was incorrectly being skipped due to the same problem. `4a545eb85d/test/test_ops.py (L546)` ex variable is not reset after this for next loop iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/146597 Approved by: https://github.com/digantdesai	2025-02-25 14:15:10 +00:00
vasiliy	e34c15a05b	torch._scaled_mm with MXFP8 (#147548 ) # summary Add blockwise MXFP8 support to `torch._scaled_mm` on CUDA capability 10.0 and higher devices. If the scales for A and B are of dtype `torch.float8_e8m0fnu`, we dispatch to the blockwise kernel from cuBLAS. This is a skeleton PR where we test basic functionality (numerics of various simple matrices, as well as one end to end quantization + gemm). - Scales are flipped based on transpose_result - Handles boundary conditions Note that MXFP4 is not added in this PR - we can tackle that in a future PR. This PR was created by taking https://github.com/pytorch/pytorch/pull/145562, switching e8m0 to in-core dtype, removing fp4 for now, and adding test cases. # test plan ``` pytest test/test_matmul_cuda.py -k blockwise_mxfp8 -s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147548 Approved by: https://github.com/drisspg Co-authored-by: drisspg <drisspguessous@gmail.com>	2025-02-25 03:32:22 +00:00
FFFrog	b0fa92042b	Fix torch.mean out dtype check (#147188 ) For CPU: Type promotion is supported for torch.mean For Meta: Not supported for torch.mean ISSUE related: https://github.com/pytorch/pytorch/issues/138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147188 Approved by: https://github.com/albanD	2025-02-25 02:50:03 +00:00
Yuanhao Ji	55bf3ff3a5	[Docs] Add `OpDTypes.any_common_cpu_cuda_one` (#147605 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147605 Approved by: https://github.com/soulitzer	2025-02-24 23:23:43 +00:00
Peter Yeh	81dccd706b	[ROCm] OCP FP8 Support for new GPUs (#146632 ) TLDR: Follow up/ Build on top of https://github.com/pytorch/pytorch/pull/144476. add OCP FP8 support for gfx950 refer to https://github.com/pytorch/ao/pull/1677 This pull request includes several changes to improve compatibility and support for new GPU architectures and data types, particularly for ROCm. The key updates involve adding support for new ROCm versions and GPU architectures, updating data type handling, and removing outdated checks. ### Improvements to GPU Architecture and ROCm Version Support: * [`aten/src/ATen/Context.cpp`](diffhunk://#diff-33de472d304acbe57d693c8567370c638068bedc1aa0ce8e9dc115dad05a7810L323-R326): Added support for new GPU architectures `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. * [`aten/src/ATen/native/cuda/Blas.cpp`](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199): Updated architecture support in multiple functions to include `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. [[1]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199) [[2]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL865-R876) ### Updates to Data Type Handling: * [`aten/src/ATen/cuda/CUDADataType.h`](diffhunk://#diff-9188bb13b1a49f459141f5f9b875593d1c5ce2beb5ad711fdbaf5bc7089ec015L81-L98): Enhanced data type conversion to include new float8 types for both CUDA and ROCm environments. * [`aten/src/ATen/cuda/tunable/GemmHipblaslt.h`](diffhunk://#diff-bfa1a3b5d4bef1892bf50338775f3b0fd8cd31fc1868148f3968b98aefb68e3fL29-R80): Updated `HipDataTypeFor` template to handle new float8 types and added hard-coded enum values for ROCm versions prior to 6.3. ### Removal of Outdated Checks: * [`cmake/public/LoadHIP.cmake`](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197): Removed the check for `HIP_NEW_TYPE_ENUMS` as it is no longer necessary with the updated ROCm versions. [[1]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197) [[2]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L211-R182) These changes ensure better compatibility and performance on newer hardware and software environments, particularly for users leveraging ROCm and CUDA for deep learning and scientific computing tasks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146632 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-02-24 22:47:52 +00:00
PyTorch MergeBot	3e2d9d079e	Revert "[ROCm] OCP FP8 Support for new GPUs (#146632 )" This reverts commit `f95ab46797`. Reverted https://github.com/pytorch/pytorch/pull/146632 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, I'll find someone to help merge this PR back to main ([comment](https://github.com/pytorch/pytorch/pull/146632#issuecomment-2676823614))	2025-02-23 12:04:50 +00:00
Aaron Orenstein	086d146f6f	Update ruff linter for PEP585 (#147540 ) This turns on PEP585 enforcement in RUFF. - Updates the target python version - Stops ignoring UP006 warnings (PEP585) - Fixes a few issues which crept into the tree in the last day Pull Request resolved: https://github.com/pytorch/pytorch/pull/147540 Approved by: https://github.com/justinchuby, https://github.com/Skylion007	2025-02-22 04:45:17 +00:00
Peter Yeh	f95ab46797	[ROCm] OCP FP8 Support for new GPUs (#146632 ) TLDR: Follow up/ Build on top of https://github.com/pytorch/pytorch/pull/144476. add OCP FP8 support for gfx950 refer to https://github.com/pytorch/ao/pull/1677 This pull request includes several changes to improve compatibility and support for new GPU architectures and data types, particularly for ROCm. The key updates involve adding support for new ROCm versions and GPU architectures, updating data type handling, and removing outdated checks. ### Improvements to GPU Architecture and ROCm Version Support: * [`aten/src/ATen/Context.cpp`](diffhunk://#diff-33de472d304acbe57d693c8567370c638068bedc1aa0ce8e9dc115dad05a7810L323-R326): Added support for new GPU architectures `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. * [`aten/src/ATen/native/cuda/Blas.cpp`](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199): Updated architecture support in multiple functions to include `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. [[1]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199) [[2]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL865-R876) ### Updates to Data Type Handling: * [`aten/src/ATen/cuda/CUDADataType.h`](diffhunk://#diff-9188bb13b1a49f459141f5f9b875593d1c5ce2beb5ad711fdbaf5bc7089ec015L81-L98): Enhanced data type conversion to include new float8 types for both CUDA and ROCm environments. * [`aten/src/ATen/cuda/tunable/GemmHipblaslt.h`](diffhunk://#diff-bfa1a3b5d4bef1892bf50338775f3b0fd8cd31fc1868148f3968b98aefb68e3fL29-R80): Updated `HipDataTypeFor` template to handle new float8 types and added hard-coded enum values for ROCm versions prior to 6.3. ### Removal of Outdated Checks: * [`cmake/public/LoadHIP.cmake`](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197): Removed the check for `HIP_NEW_TYPE_ENUMS` as it is no longer necessary with the updated ROCm versions. [[1]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197) [[2]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L211-R182) These changes ensure better compatibility and performance on newer hardware and software environments, particularly for users leveraging ROCm and CUDA for deep learning and scientific computing tasks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146632 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-02-21 23:44:08 +00:00
Nichols A. Romero	fd8ae1aa04	[ROCm] gfx940 and gfx941 cleanup (#147394 ) Removing gfx architectures not supported by ROCm. NOTE: For users wanting to build PyTorch for gfx archs that are not supported by the official wheels on download.pytorch.org, you can build PyTorch from source for your desired gfx arch [using the PYTORCH_ROCM_ARCH env var](https://github.com/pytorch/pytorch/blob/main/README.md#amd-rocm-support). Pull Request resolved: https://github.com/pytorch/pytorch/pull/147394 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily	2025-02-21 19:42:12 +00:00
Catherine Lee	863ac20659	[CI] Do not overwrite return code of test file when fails for rerun disabled tests (#147484 ) Do not overwrite the return code of a single file when it fails. This will allow the log to be printed to stdout and the gha logs Pull Request resolved: https://github.com/pytorch/pytorch/pull/147484 Approved by: https://github.com/ZainRizvi	2025-02-20 17:51:58 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
rzou	fea718f062	[BaseHOP] change hop(subgraph, operands) to hop(subgraph, *operands) (#146730 ) Our three main users are OK with this, with two of them (foreach_map, invoke_quant) prefering it like this. I was originally worried about BC issues (this now means you cannot add any positional args) but I think that's not a concern -- one can always add kwonly args. Test Plan - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146730 Approved by: https://github.com/ydwu4, https://github.com/mlazos	2025-02-20 02:30:36 +00:00
Riley Dulin	93316cfe94	Move ir_pre_fusion.txt and ir_post_fusion.txt to TORCH_LOGS (#147248 ) Fixes #147002 Moves ir_{pre, post}_fusion.txt to be controlled by TORCH_LOGS instead of TORCH_COMPILE_DEBUG. Updated tests of these logs as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147248 Approved by: https://github.com/eellison	2025-02-20 00:26:17 +00:00
William Wen	16e202a38e	[dynamo] improved graph break messages for some common graph break sites [1/N] (#146525 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146525 Approved by: https://github.com/jansel	2025-02-20 00:08:13 +00:00
PyTorch MergeBot	babb2dc2af	Revert "Add torch._scaled_mm for CPU (#139975 )" This reverts commit `6f7e67c43c`. Reverted https://github.com/pytorch/pytorch/pull/139975 on behalf of https://github.com/wdvr due to failing inductor mkldnn_pattern_matcher_cpu tests ([comment](https://github.com/pytorch/pytorch/pull/139975#issuecomment-2667186865))	2025-02-18 23:58:31 +00:00
mori360	a21a123fd5	Add fqn_modifier at loading_state_dict and unit test (#146557 ) In Fusion model, users might change the state_dict keys by state_dict_hook The load_state_dict APIs here won't call model.state_dict() so that the hooks won't be called to change the keys, causing the mismatch between fqn and state_dict keys. The PR here suggests users to add how they would change the state_dict key prefix (they can name it, here we call "fqn_modifiers") by default During loading state_dict, we have the prefix change during getting fqn so that they can be processed same as through state_dict hook. For example: There's a state_dict_hook: ``` def _state_dict_hook(self, destination, prefix, keep_vars): """Remove "embedding" from the original embedding in the state_dict name. This keeps the orginal state dict name for the embedding from before fusing with the FusionEmbedding. [!Note] This update changes the order of the OrderedDict """ key = prefix + "embedding.weight" new_key = prefix + "weight" destination[new_key] = destination[key] del destination[key] ``` In the dsd after this PR, we would skip "embedding." before "weight" if find the "fqn_modifiers" attribute at that module ``` def fqn_modifiers(self) -> Dict[str, str]: return { "weight": "embedding", } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146557 Approved by: https://github.com/fegin	2025-02-18 22:54:41 +00:00
Jiang, Yanbing	6f7e67c43c	Add torch._scaled_mm for CPU (#139975 ) This PR is to add `torch._scaled_mm` for CPU backend. `_scaled_mm_out_cpu` and `_scaled_mm_cpu` are new added and included in `torch._scaled_mm` CPU dispatch. We also add `_scaled_mm_out_cpu_emulated` as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139975 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2025-02-18 18:44:26 +00:00
PyTorch MergeBot	49e8f9c965	Revert "Add torch._scaled_mm for CPU (#139975 )" This reverts commit `22fae4c5f9`. Reverted https://github.com/pytorch/pytorch/pull/139975 on behalf of https://github.com/huydhn due to third time is the charm ([comment](https://github.com/pytorch/pytorch/pull/139975#issuecomment-2664622598))	2025-02-18 05:11:32 +00:00
Jiang, Yanbing	22fae4c5f9	Add torch._scaled_mm for CPU (#139975 ) This PR is to add `torch._scaled_mm` for CPU backend. `_scaled_mm_out_cpu` and `_scaled_mm_cpu` are new added and included in `torch._scaled_mm` CPU dispatch. We also add `_scaled_mm_out_cpu_emulated` as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139975 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2025-02-17 18:39:10 +00:00
Aaron Gokaslan	e738f7ba23	[BE]: Enable ruff rule SIM113 (#147290 ) Lint rules that tells the user to avoid keeping track of their own counter and use the builtin enumerate when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147290 Approved by: https://github.com/jansel	2025-02-16 22:41:16 +00:00
PyTorch MergeBot	aac5d1a289	Revert "Add torch._scaled_mm for CPU (#139975 )" This reverts commit `f0bdc27f74`. Reverted https://github.com/pytorch/pytorch/pull/139975 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it looks like internal ideep version is too old to support this ([comment](https://github.com/pytorch/pytorch/pull/139975#issuecomment-2660008996))	2025-02-14 18:31:54 +00:00
PyTorch MergeBot	e06ee4aa9f	Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073 )" This reverts commit `06f4a5c0e5`. Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2659802389))	2025-02-14 16:44:46 +00:00
atalman	06f4a5c0e5	Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073 ) Should resolve: https://github.com/pytorch/pytorch/issues/144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj	2025-02-14 15:29:59 +00:00
cyy	d473c212fd	Remove code for Python < 3.9 (#147097 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147097 Approved by: https://github.com/albanD	2025-02-14 03:22:49 +00:00
Jiang, Yanbing	f0bdc27f74	Add torch._scaled_mm for CPU (#139975 ) This PR is to add `torch._scaled_mm` for CPU backend. `_scaled_mm_out_cpu` and `_scaled_mm_cpu` are new added and included in `torch._scaled_mm` CPU dispatch. We also add `_scaled_mm_out_cpu_emulated` as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139975 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2025-02-14 02:03:53 +00:00
xinan.lin	9befdf565a	[Break XPU][Inductor UT] Set input tensors to corresponding device for test case in test_aot_indutor.py (#145248 ) Fix #145247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145248 Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/EikanWang ghstack dependencies: #146762	2025-02-14 01:39:11 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit `c7515da7b0`. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
IvanKobzarev	7c3b2a29ec	[subclass] testing WrapperSubclass respect outer_size, outer_stride (#146897 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146897 Approved by: https://github.com/bdhirsh	2025-02-13 15:21:19 +00:00
Nikita Shulga	66fb10fc53	[BE][OpInfo] Introduce generic `dtypesIf` (#146905 ) Use `__setattr__` and `__getattribute__` to wrap existing `dtypesIfXYZ` into it, which will allow for subsequent incremental elimination of those Also, type annotation for OpInfo is a sham: it claims that `dtypes` and `dtypesIfXYZ` must be of type `_dispatch_dtypes`, but in reality it's converted to set in post init. Test Plan: - Check that `op_db[0].dtypesIfCUDA` and others shows the same values as before, by running the following script ```python from torch.testing._internal.common_methods_invocations import op_db print({name: getattr(op_db[0], f'dtypesIf{name}') for name in ['CUDA', 'ROCM', 'XPU', 'Hpu']}) ``` - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/146905 Approved by: https://github.com/janeyx99	2025-02-13 05:33:17 +00:00
Guilherme Leobas	f954aac6be	Add `make_dynamo_test` (#146491 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146491 Approved by: https://github.com/zou3519, https://github.com/anijain2305, https://github.com/malfet	2025-02-12 22:54:29 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
eellison	92b7e610ab	[Inductor changes] Invoke Quant (#139102 ) Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](`949fdd2997/torch/_inductor/fx_passes/post_grad.py (L390-L426)`. Compared to the pattern I have in the tests: ``` @register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(args, *kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following https://github.com/pytorch/pytorch/pull/139162. Feedback welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139102 Approved by: https://github.com/Chillee	2025-02-08 19:30:19 +00:00
Blaine Burton Rister	a1bfb39a31	[Inductor] Expand Identity ops prior to block pattern matching (#146000 ) # Feature Inductor sometimes uses `Identity` functions to group various terms of an expression. While this is convenient in some scenarios, it can frustrate pattern matching. For example, when we're matching an indexing expression to tell if it can be represented as a block pointer, that analysis should be invariant to `Identity`'s. This PR adds a few features to achieve this invariance. - Create a new expansion mode `expr.expand(identity=True)`, which removes all `Identity` functions from the expression. - Preprocess the expression with this expansion prior to pattern matching. - Bonus: create a new test utility function called `dummy_graph()`, which creates a simple `GraphLowering`. This is useful for testing the pattern matcher, as we need to initialize `V.graph` before we can access `V.graph.sizevars`. # Test plan This PR adds a few new unit tests: - Added a unit test specifically for `expr.expand(identity=True)`. - Added a new unit test module for the block pattern matcher. Tested that we can correctly match some example patterns containing Identity ops. I originally intended to add an end to end test compiling pointwise cat, and mapping the corresponding memory accesses to block pointers. However, it looks like that will take more work, since the [relevant code path](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/triton.py#L1306) disables block pointer analysis. It might be better to defer that to a future PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146000 Approved by: https://github.com/eellison, https://github.com/jansel	2025-02-08 18:11:53 +00:00
cyyever	46e83bb637	Fix linter F821 error (#146665 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146665 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-02-08 07:19:37 +00:00
Shunting Zhang	bc0191802f	[inductor] add size-asserts for fallback ops (#145904 ) Fix https://github.com/pytorch/pytorch/issues/144717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145904 Approved by: https://github.com/jansel	2025-02-07 18:44:32 +00:00
Michael Diggin	e01a5e9e1e	Small improvements to NJT matrix multiplies (#146405 ) Fixes #146404 Adds changes to the matmul and matmul_backward operation for nested jagged tensors, to support back propagation when the output is a regular strided tensor. This required adding support for the nested matmul operation to work when the nested tensor wasn't 'self', i.e `A@B` where `A` isn't nested but `B` is. The operation schemas had to be updated to reflect that either input can be a strided tensor instead (and the gradient), so an extra assertion is added in an edge case where neither input is nested. Unit tests are also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146405 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser	2025-02-06 04:51:12 +00:00
Nikita Shulga	6a985d8b2e	Make `inductor_utils.requires_gpu` accept MPS (#145156 ) Not yet ready to setp HAS_GPU to true, but can unskip tests that require GPU (Noticed while running test_mps_basics.py that `test_scalar_cpu_tensor_arg` is getting skipped) - Replace `GPU_TYPE` with `self.device` in `test_custom_op_fixed_layout_sequential`, `test_inductor_layout_optimization_input_mutations`, `test_mutable_custom_op_fixed_layout2` otherwise they GPU tests are just running for _cpu suffixes. - Tweak `test_tmp_not_defined_issue3` to work correctly on CPU, by defining `test_device` and `test_device_0` - UnXFail `test_mutable_custom_op_fixed_layout2_dynamic_shapes` as it should just work on CPU - Add `skip_if_no_triton` decorator and decorate `test_reduction_config_limit` with it, as it does not need CPU nor GPU, but rather a triton backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145156 Approved by: https://github.com/dcci, https://github.com/Skylion007, https://github.com/jansel	2025-02-06 01:14:36 +00:00
rzou	98b5d455fd	[opcheck] Improve error reporting; allow atol/rtol overrides (#146488 ) This PR improves opcheck to: 1. directly use torch.testing.assert_close (without a msg override). This allows it to print the absolute and relative differences and the number of mismatched elements. 2. take in an atol/rtol tolerance (for if someone just wants to use opcheck in their testing). Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146488 Approved by: https://github.com/williamwen42	2025-02-05 21:25:06 +00:00
eqy	6f7fda3f49	Bump `nn.functional.conv3d` tolerances for `test_comprehensive` (#135719 ) `float16` tolerance was previously set to `1e-5` which seemed very low Pull Request resolved: https://github.com/pytorch/pytorch/pull/135719 Approved by: https://github.com/Chillee, https://github.com/albanD	2025-02-05 18:34:12 +00:00
Haifeng Jin	8177fc4d33	Make regex error catching compatible with Python 3.12+. (#145945 ) In Python 3.12, the error message has changed from "Can't pickle local object" to "Can't get local object". The old regex would no longer catch the error. This PR make it compatible with Python 3.12 and backward compatible as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145945 Approved by: https://github.com/H-Huang	2025-02-05 00:57:36 +00:00
Aaron Gokaslan	7f65a20884	[BE]: Enable ruff SLOT checks (#146276 ) This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276 Approved by: https://github.com/aorenste	2025-02-04 19:18:23 +00:00
rzou	0f768c7866	Barebones flat_apply HOP (#146060 ) This PR: - adds pytree.register_constant for registering a class to be treated as a constant by torch.compile/torch.fx - adds a very barebones flat_apply HOP. This should be sufficient to get mark_traceable working. A lot more work is necessary to get the custom operator case working (when make_fx sees a custom operator with PyTree arg types, it needs to emit a call to the flat_apply HOP). - I expect the flat_apply HOP to change a lot, I want to ship this in the current state to unblock the mark_traceable and custom ops work. Test Plan: - It's kind of difficult to test the barebones flat_apply HOP "works" so I added a really simple test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146060 Approved by: https://github.com/StrongerXi, https://github.com/yanboliang ghstack dependencies: #146059	2025-02-01 16:17:48 +00:00
Aleksandar Samardžić	2b00d211f0	Build RowwiseScaledMM.cu for SM89 (#145676 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145676 Approved by: https://github.com/drisspg, https://github.com/malfet, https://github.com/eqy	2025-02-01 11:44:58 +00:00
Wei Wang	3a4e7a589b	[CI][Distributed] Fix edge case: One rank case (Rank 0) should get [False, False] (#146099 ) To match the expected tensor (i.e. 2nd element in the array). Making rank0 receive [False, False] Fixes one of the issues reported in #146094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146099 Approved by: https://github.com/eqy	2025-01-31 20:31:13 +00:00
Sherlock Huang	cf2de4e230	Introduce aoti_call_delegate HOP (#145630 ) Summary: Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. I introduce a new HOP to address this. The schema is following ``` aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor]) ``` There are a few problems exposed by HOP - AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. - As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail. Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test Reviewed By: zhxchen17 Differential Revision: D68359391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145630 Approved by: https://github.com/zou3519	2025-01-31 04:57:36 +00:00
clr	6b41f310c2	config: Support str env variables (#145980 ) Summary: This allows us to use environment variables to set string values. We've added tests for the specific functionality implemented here. Note that we already accidentally started setting up configs to use this, so we're just adding the feature. Additionally, we're not fully validating the underlying type when we set the value (and in general, it's more difficult than we would like to do this). Let me know if people feel strongly, and we can add a PR to do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145980 Approved by: https://github.com/yushangdi, https://github.com/oulgen	2025-01-30 00:13:02 +00:00
rzou	1e57154af3	Require that all HOPs be imported at `import torch` time (#145939 ) E.g. torch.ops.higher_order.cond does not exist until it is imported, which is bad if it shows up in an FX graph or is used in some code somewhere. This PR also makes some more HOPs get imported at `import torch` time. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145939 Approved by: https://github.com/ydwu4 ghstack dependencies: #145938	2025-01-29 22:27:52 +00:00
rzou	2141c1aebe	Better hop_db comment; move test to a non-export test file (#145938 ) Goal is for people to better test their HOPs. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145938 Approved by: https://github.com/ydwu4	2025-01-29 22:27:52 +00:00
Benjamin Glass	5aa5a5763e	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy	2025-01-28 22:01:08 +00:00
Isuru Fernando	ccc2878c97	Fix fractional_max_pool lowering in inductor (#144395 ) Fixes https://github.com/pytorch/pytorch/issues/141538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144395 Approved by: https://github.com/amjames, https://github.com/eellison	2025-01-28 21:00:18 +00:00
PyTorch MergeBot	cfbb27462e	Revert "[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373 )" This reverts commit `b8087747f5`. Reverted https://github.com/pytorch/pytorch/pull/145373 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/145373#issuecomment-2619674197))	2025-01-28 17:46:11 +00:00
Sam Larsen	1835e1eb98	[BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo (#145307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145307 Approved by: https://github.com/zou3519, https://github.com/FindHao	2025-01-27 18:12:39 +00:00
soulitzer	3a3e2cf90a	Remove det_singular OpInfo (#145533 ) Fixes https://github.com/pytorch/pytorch/issues/93045 https://github.com/pytorch/pytorch/issues/93044 From previous discussion https://github.com/pytorch/pytorch/issues/93045#issuecomment-1477674083 the resolution is that we're okay with removing this. Some older attempts: - https://github.com/pytorch/pytorch/pull/102581 - https://github.com/pytorch/pytorch/pull/109249 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145533 Approved by: https://github.com/lezcano, https://github.com/malfet ghstack dependencies: #145520, #145531	2025-01-25 00:58:03 +00:00
soulitzer	c7ca1df37e	Disable slow gradcheck for nn.Transformer ModuleInfo (#145531 ) Fixes https://github.com/pytorch/pytorch/issues/117140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145531 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #145520	2025-01-25 00:58:03 +00:00
Marc Horowitz	efebec5ef5	[dcp] Add ZStandard transformer (#143360 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143360 Approved by: https://github.com/saumishr, https://github.com/albanD ghstack dependencies: #145528	2025-01-25 00:14:07 +00:00
Bin Bao	b8087747f5	[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373 ) Differential Revision: D68278174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145373 Approved by: https://github.com/Skylion007	2025-01-24 17:59:13 +00:00
Oguz Ulgen	d3989ca636	Add multi env variable support to configs (#145288 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145288 Approved by: https://github.com/c00w	2025-01-24 10:04:24 +00:00
PyTorch MergeBot	714f64329b	Revert "Add multi env variable support to configs (#145288 )" This reverts commit `a8b7cb6a2d`. Reverted https://github.com/pytorch/pytorch/pull/145288 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lint from a landrace with some recent PEP585 changes ([comment](https://github.com/pytorch/pytorch/pull/145288#issuecomment-2611278428))	2025-01-24 00:20:00 +00:00
Oguz Ulgen	a8b7cb6a2d	Add multi env variable support to configs (#145288 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145288 Approved by: https://github.com/c00w	2025-01-23 23:00:23 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit `0940eb6d44` (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
Sam Larsen	045698653a	[BE] Remove test_ops_gradients from FIXME_inductor_dont_reset_dynamo (#145308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145308 Approved by: https://github.com/zou3519 ghstack dependencies: #145306	2025-01-23 17:25:04 +00:00
Shunting Zhang	b6941d4e42	[inductor] fix autotuning memory usage (#145410 ) We use `cpu_tensor.copy_(gpu_tensor)` to clone mutated kernel arguments for autotuning. The purpose is to avoid increasing peak memory due to the clone. But if `gpu_tensor` is not contiguous, this `copy_` will need allocate an temporary tensor on GPU to store a contiguous copy of `gpu_tensor`: `6e53588789/aten/src/ATen/native/cuda/Copy.cu (L322-L334)` Here is a standalone script to illustrate this behavior: https://gist.github.com/shunting314/812a848dc67b1d674ae42415a7a462c8 . The script report 6GB rather than 3GB peak memory usage. Note that, with all the following efforts 1. donated buffer 2. inplace padding 3. this PR We save 3GB peak memory (18.6GB -> 15.5GB) for GPT2 model for torch.compile. The peak memory of GPT2 is like a '...\_M\_...' shape. There are 2 places that we reach the peak. Donated buffer remove the first peak by computing grad_softmax inplace, and inplace padding removes the second peak by not allocating an extra buffer for mm-padding. Before all these optimizations, the peak memory is 18.6GB for GPT2 with torch.compile. With 1 & 2, the peak memory is 1. 17.7GB with a cold cache 2. 15.5GB with a warm cache (since the autotuning overhead is skipped) With 1 & 2 & 3, we save 3GB peak memory (18.6GB -> 15.5GB) no matter if autotuning happens or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/145410 Approved by: https://github.com/masnesral, https://github.com/jansel ghstack dependencies: #140249, #145325	2025-01-23 09:34:23 +00:00
Sam Larsen	28c251dd0b	[BE] Remove test_modules from FIXME_inductor_dont_reset_dynamo (#145306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145306 Approved by: https://github.com/zou3519	2025-01-23 06:37:21 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Joel Schlosser	af204135d8	Fix NJT fill.Scalar for contiguous inputs (#144586 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. This PR implements the missing `fill.Scalar` support, which works fine for contiguous inputs, but there is still some AOTAutograd debugging required to handle non-contiguous transposed NJTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144586 Approved by: https://github.com/soulitzer	2025-01-21 18:22:08 +00:00
Aaron Orenstein	dea7ad3371	PEP585 update - torch/testing (#145200 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145200 Approved by: https://github.com/bobrenjc93	2025-01-20 22:42:42 +00:00
Sun, Jiayi	92b9da1fc2	fix torch.atan for torch.complex datatypes on CPU (#144749 ) Fix https://github.com/pytorch/pytorch/issues/141487. This issue is caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `atan`. For correctness, I temporarily fallback the implementation of `atan` to scalar implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144749 Approved by: https://github.com/mingfeima, https://github.com/Skylion007	2025-01-20 08:45:03 +00:00
Sun, Jiayi	c922ccb7c4	fix sigmoid for torch.complex datatypes on CPU (#140391 ) Fix https://github.com/pytorch/pytorch/issues/135777. This issue is caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `reciprocal`. For correctness, I temporarily fallback the implementation of `reciprocal` to scalar implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140391 Approved by: https://github.com/mingfeima, https://github.com/Skylion007 ghstack dependencies: #140358	2025-01-20 08:23:58 +00:00
Sun, Jiayi	507bf65c6a	fix torch.exp for torch.complex datatypes on CPU (#140358 ) Fix https://github.com/pytorch/pytorch/issues/48010, https://github.com/pytorch/pytorch/issues/136063. These two issues are caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `exp`. For correctness, I temporarily fallback the implementation of `exp` to scalar implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140358 Approved by: https://github.com/mingfeima, https://github.com/Skylion007	2025-01-20 08:03:17 +00:00
PyTorch MergeBot	5802be698e	Revert "parametrized test name handles class arguments (#133546 )" This reverts commit `4e4b8592a3`. Reverted https://github.com/pytorch/pytorch/pull/133546 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but trying to disable the new tests does seem to fully cover all the cases and some are still failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/133546#issuecomment-2599814339))	2025-01-18 18:12:18 +00:00
Joel Schlosser	3ee531f8b9	Support NJT chunk() backward on batch dim (#144584 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. Implements `chunk()` backward on the batch dim, which was left out before. This PR unbinds the components and invokes `copy_()` on these to pass along the appropriate gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144584 Approved by: https://github.com/soulitzer ghstack dependencies: #144582, #144583	2025-01-18 15:58:24 +00:00
Will Constable	2c4281d7da	Make MultiProcContinuousTest timeout configurable (#145099 ) Allows test classes using MPCT to set their own timeout as a class property, which is good enough since the processgroup is shared across test instances and the timeout is set at processgroup init. Also sets a default timeout of 2 minutes, which is probably (?) long enough for reasonable tests, but can be changed if it causes flakyness. It's preferable to have as short default timeout as possible, since when debugging tests getting a timeout quickly helps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145099 Approved by: https://github.com/d4l3k, https://github.com/fduwjj ghstack dependencies: #145010, #145011	2025-01-18 04:37:12 +00:00
Nicolas Macchioni	4e4b8592a3	parametrized test name handles class arguments (#133546 ) Previously, parametrized tests with class arguments, for example ``` @parametrize("this_cls", (Foo, Bar)) ``` would create parametrized tests with names `test_foo_this_cls0` and `test_foo_this_cls1`. With this change, we instead should get `test_foo_this_cls_Foo` and `test_foo_this_cls_Bar` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/133546 Approved by: https://github.com/eellison	2025-01-17 22:48:38 +00:00
Joel Schlosser	cac10b8190	Fix NJT OpInfo entry for nn.functional.prelu (#144582 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. The OpInfo entry for prelu was wrong before this PR; `weight` needs to be passed as well. The op isn't fully implemented yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144582 Approved by: https://github.com/soulitzer	2025-01-17 20:36:15 +00:00
Tom Ritchford	46fbd63405	Fix unbind_copy and add its decomposition (#134319 ) * Fixes https://github.com/pytorch/pytorch/issues/130829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134319 Approved by: https://github.com/amjames, https://github.com/eellison	2025-01-17 18:21:22 +00:00
Marc Horowitz	ba3f1c29ee	[dcp] Add extension mechanism (#143358 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143358 Approved by: https://github.com/saumishr	2025-01-17 01:51:37 +00:00
Fuzzkatt	7c7bcb1e33	update IS_JETSON check (#144725 ) update IS_JETSON check to include the latest SM Pull Request resolved: https://github.com/pytorch/pytorch/pull/144725 Approved by: https://github.com/eqy	2025-01-16 22:34:48 +00:00
Brian Hirsh	d7f45fc575	dynamic shape support for interpolate(antialias=True) backward (#141198 ) Fixes https://github.com/pytorch/pytorch/issues/141187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141198 Approved by: https://github.com/ezyang, https://github.com/Chillee ghstack dependencies: #141161	2025-01-16 00:08:25 +00:00
Aaron Orenstein	8ad37ed710	Stop ignoring mypy errors in torch/testing/_internal/common_utils.py (#144483 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144483 Approved by: https://github.com/Skylion007	2025-01-14 22:32:51 +00:00
Shangdi Yu	5c727d5679	[minifier] Fix config generator for callables (#144518 ) Summary: When config contains callables, the current configs generated cannot be run: ``` torch._dynamo.config.reorderable_logging_functions = {<built-in function print>, <function warning at 0x7f774c595630>, <function log at 0x7f774c595870>, <function error at 0x7f774c595510>, <function info at 0x7f774c595750>, <built-in function warn>, <function exception at 0x7f774c5955a0>, <function debug at 0x7f774c5957e0>, <function critical at 0x7f774c5953f0>} ``` We fix the config to generate the right string, so the config is runnable, like below ``` import logging import warnings torch._dynamo.config.reorderable_logging_functions = { warnings.warn, logging.warn, print } ``` Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:utils -- -r test_codegen_config ``` Differential Revision: D67998703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144518 Approved by: https://github.com/desertfire	2025-01-14 17:18:13 +00:00
RAHUL SINGH	95b41d2aa4	Tests Generelization for multiple accelerator devices (#139749 ) Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Chnages: There are general changes in common_dtesnor module for device type generalization so that tests can be executed on non cuda devices too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139749 Approved by: https://github.com/kwen2501	2025-01-14 08:52:46 +00:00
Isalia20	80eff6e720	[MPS] fix triangular for >3D tensors (#144545 ) Old implementation leads to incorrect output due to not handling the other batch sizes other than 3D tensors(B, M, N) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144545 Approved by: https://github.com/malfet	2025-01-14 08:25:01 +00:00
PyTorch MergeBot	dfe06e555d	Revert "Stop ignoring mypy errors in torch/testing/_internal/common_utils.py (#144483 )" This reverts commit `dcc04e9237`. Reverted https://github.com/pytorch/pytorch/pull/144483 on behalf of https://github.com/kit1980 due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/144441 ([comment](https://github.com/pytorch/pytorch/pull/144483#issuecomment-2588515018))	2025-01-14 00:46:48 +00:00
Aaron Orenstein	dcc04e9237	Stop ignoring mypy errors in torch/testing/_internal/common_utils.py (#144483 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144483 Approved by: https://github.com/Skylion007	2025-01-13 23:19:44 +00:00
PyTorch MergeBot	3753d30273	Revert "Stop ignoring mypy errors in torch/testing/_internal/common_utils.py (#144483 )" This reverts commit `9f09b719d3`. Reverted https://github.com/pytorch/pytorch/pull/144483 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it somehow breaks memory leak checks ([comment](https://github.com/pytorch/pytorch/pull/144483#issuecomment-2585004792))	2025-01-11 02:10:16 +00:00
Alexander Grund	fdc4f9dde2	Avoid running helper functions as test (#144544 ) Pytest considers all symbols starting with `test_` as a test case/function and runs them. The `test_compiled_fsdp` is a decorator but due to the import discovered by pytest. Rename it to avoid. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144544 Approved by: https://github.com/Skylion007	2025-01-10 17:15:50 +00:00
Nikita Shulga	e6b9e67465	[BE][Opinfo] Delete redundant `dtypesIfCUDA` (#144512 ) If they are the same as CPU, no need to have that extra line Discovered while reviewing https://github.com/pytorch/pytorch/pull/143833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144512 Approved by: https://github.com/Skylion007	2025-01-10 15:15:38 +00:00
bobrenjc93	3b6b306b71	Migrate from Tuple -> tuple in torch/testing (#144256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144256 Approved by: https://github.com/aorenste	2025-01-10 06:37:55 +00:00
Aaron Orenstein	9f09b719d3	Stop ignoring mypy errors in torch/testing/_internal/common_utils.py (#144483 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144483 Approved by: https://github.com/Skylion007	2025-01-10 02:31:43 +00:00
Dmitry Nikolaev	d4871750d9	[ROCm] Enable post-merge trunk workflow on MI300 runners; skip and fix MI300 related failed tests (#143673 ) This PR * makes changes to the workflow files and scripts so we can run CI workflows on the MI300 runners * skips and fixes several tests, failed on MI300, observed in https://github.com/pytorch/pytorch/pull/140989 Skipped due to unsupported Float8_e4m3fn data type on MI300 (need to update test code to use datatypes supported by MI300): - distributed.tensor.parallel.test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_all_gather_scaled_matmul_A_dims_\_gather_dim_\ (24 tests across inductor/distributed configs) - distributed.tensor.parallel.test_micro_pipeline_tp.py::test_fuse_scaled_matmul_reduce_scatter_A_dims_\_scatter_dim_\ (12 tests across inductor/distributed configs)) - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_cast_and_t - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_pattern_2 Skipped due to AssertionError on MI300: - inductor.test_mkldnn_pattern_matcher.py::test_qconv2d_int8_mixed_bf16 - distributed._tools.test_sac_ilp::TestSACILP::test_sac_ilp_case1 Skipped: - test_cuda.py::TestCudaMallocAsync::test_clock_speed - test_cuda.py::TestCudaMallocAsync::test_power_draw - test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda Skipped flaky tests on MI300: - distributed.test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda - inductor.test_cpu_repro::CPUReproTests::test_lstm_packed_unbatched_False* (256 tests) Fixed: - test_matmul_cuda.py::TestFP8MatmulCudaCUDA::test_float8_basics_cuda Features: - inductor/test_fp8.py - declare a new function to convert FP8 datatypes to ROCm supported FP8 datatypes. It keeps test names for CUDA and ROCm and allows to enable Inductor FP8 tests on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/143673 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/pruthvistony Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-09 05:18:57 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
George Wigley	a5051a9521	Update torch.masked.mean to upcast dtype for bool tensors (#139999 ) When calling `torch.masked.mean(...)` with a boolean tensor, the dtype is inferred to be bool. When the mean is being computed, the sum operator is used. When the sum operator is used with dtype=torch.bool, the result is clamped to True (1) leading to an incorrect mean being calculated. The below example shows how the incorrect result occurs: ``` a = torch.tensor([True, True]) count = torch.sum(torch.ones(a.shape, dtype=torch.int64)) # 2 total = torch.sum(a, dtype=torch.bool) # True (1) mean = total / count # 0.5 ``` This PR upcasts the dtype used for the sumation to int32 in the case of bool tensors allowing for the correct result to be computed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139999 Approved by: https://github.com/cpuhrsch	2025-01-08 10:35:19 +00:00
Aaron Gokaslan	e4a05dec0f	[BE][Ez]: Fix docs recommending inefficient tensor op order (#144270 ) `detach().clone()` is faster than `.clone().detatch()` since the gradients are not cloned. Let's update all the documentation and tests so that users do not use the inefficient op ordering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144270 Approved by: https://github.com/awgu, https://github.com/XuehaiPan	2025-01-07 17:31:32 +00:00
RAHUL SINGH	bf7747e935	Tests Generelization for multiple accelerator devices (#139184 ) Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Depedency : #133209 Merged now. There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments. @kwen2501 @zeshengzong Please review the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139184 Approved by: https://github.com/kwen2501 Co-authored-by: Yu, Guangye <guangye.yu@intel.com>	2025-01-07 09:04:38 +00:00
PyTorch MergeBot	f4e9aebbcc	Revert "Update torch.masked.mean to upcast dtype for bool tensors (#139999 )" This reverts commit `0742b2366e`. Reverted https://github.com/pytorch/pytorch/pull/139999 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it has a landrace and fails a test in trunk ([comment](https://github.com/pytorch/pytorch/pull/139999#issuecomment-2574283986))	2025-01-07 02:42:55 +00:00
George Wigley	0742b2366e	Update torch.masked.mean to upcast dtype for bool tensors (#139999 ) When calling `torch.masked.mean(...)` with a boolean tensor, the dtype is inferred to be bool. When the mean is being computed, the sum operator is used. When the sum operator is used with dtype=torch.bool, the result is clamped to True (1) leading to an incorrect mean being calculated. The below example shows how the incorrect result occurs: ``` a = torch.tensor([True, True]) count = torch.sum(torch.ones(a.shape, dtype=torch.int64)) # 2 total = torch.sum(a, dtype=torch.bool) # True (1) mean = total / count # 0.5 ``` This PR upcasts the dtype used for the sumation to int32 in the case of bool tensors allowing for the correct result to be computed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139999 Approved by: https://github.com/cpuhrsch	2025-01-07 00:26:59 +00:00
Tugsbayasgalan Manlaibaatar	c68c38c673	Support getattr for tensor subclasses in pre-dispatch export via patching tensor.getattr (#143946 ) Previous discussion: https://github.com/pytorch/pytorch/pull/143671#issuecomment-2560112499 and https://github.com/pytorch/pytorch/pull/143671 Differential Revision: [D67693609](https://our.internmc.facebook.com/intern/diff/D67693609) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143946 Approved by: https://github.com/bdhirsh	2025-01-06 23:55:50 +00:00
Sun, Jiayi	a8e97d9d4d	fix torch.acos and torch.asin for torch.complex datatypes on CPU (#134838 ) Fix https://github.com/pytorch/pytorch/issues/134487, https://github.com/pytorch/pytorch/issues/138327. These two issues are caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `asin`. For correctness, I temporarily fallback the implementation of `asin `to scalar implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134838 Approved by: https://github.com/mingfeima, https://github.com/Skylion007	2025-01-06 06:17:39 +00:00
Jack Morris	9f94710e48	Update core.py to fix typo (#144201 ) dype -> dtype Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/144201 Approved by: https://github.com/Skylion007	2025-01-05 18:20:52 +00:00

1 2 3 4 5 ...

5511 Commits