pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Shunting Zhang	901b02cf16	[Inductor] fix alignement assumption for fallback (#150777 ) Inductor right now only works properly for fallback kernels producing aligned output. When Inductor create layout for fallback kernel output, Inductor does not add the tensor offset to the layout [link](`2a1e2b88ed/torch/_inductor/ir.py (L6935-L6941)`). Thus unaligned output will be treated as aligned. Adding the offset to the layout directly does not work since that change the index expression in the generated kernel and we may 'double' applying the offset. Triton already considers the offset when passing in the data_ptr. To solve this issue, we track the unaligned buffer names instead. This potentially can fix the internal issues we are debugging here: https://fb.workplace.com/groups/1075192433118967/permalink/1618308128807392/ Differential Revision: [D72600784](https://our.internmc.facebook.com/intern/diff/D72600784) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150777 Approved by: https://github.com/eellison, https://github.com/jansel	2025-04-08 18:49:44 +00:00
Yanan Cao (PyTorch)	c36d9b0d8d	[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/torch/ao (#150826 ) Differential Revision: D72615631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150826 Approved by: https://github.com/ydwu4	2025-04-08 18:49:22 +00:00
Basil Wong	aafc4b6188	Do not depend on numpy during the import (#150816 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/149681 We can follow up with a different implementation that does not use numpy(potentially with Torch primitives). Test Plan: pending: contbuild & OSS CI Differential Revision: D72609835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150816 Approved by: https://github.com/jerryzh168, https://github.com/cyyever, https://github.com/albanD	2025-04-08 18:12:53 +00:00
Guilherme Leobas	e6bd133866	add batching rule for `torch.Tensor.scatter_add_` (#150543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150543 Approved by: https://github.com/zou3519	2025-04-08 18:00:10 +00:00
William Wen	97759614c2	[dynamo] reconstruct functions decorated in the compiled region properly (#150645 ) We were previously unable to reconstruct functions that were decorated in the compiled region. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150645 Approved by: https://github.com/jansel	2025-04-08 17:32:46 +00:00
PyTorch MergeBot	4926bd6004	Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095 )" This reverts commit `3da14d38bd`. Reverted https://github.com/pytorch/pytorch/pull/147095 on behalf of https://github.com/atalman due to breaks internally ([comment](https://github.com/pytorch/pytorch/pull/147095#issuecomment-2787129770))	2025-04-08 17:10:36 +00:00
FFFrog	3e0038ae85	Fix torch.matmul related out dtype check (#148174 ) ---- - torch.matmul -> CompositeImplicitAutograd -> dot_out (when left_dim == 1 & right_dim == 1) -> mv_out (when left_dim == 2 & right_dim == 1) -> mm_out (when left_dim == 1 & right_dim == 2) -> ... - torch.dot - torch.vdot - torch.mm - torch.mv ISSUE related: https://github.com/pytorch/pytorch/issues/138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148174 Approved by: https://github.com/jansel	2025-04-08 17:00:28 +00:00
Animesh Jain	173f126068	[invoke_subgraph] Preserve node meta (#150782 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150782 Approved by: https://github.com/bdhirsh ghstack dependencies: #150666	2025-04-08 16:57:39 +00:00
Yuanhao Ji	1239260a0e	[Accelerator][Chore] Use existing `acc` when raising an error (#150829 ) As the title said, `acc` already exists so we just use it instead of calling `current_accelerator()` again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150829 Approved by: https://github.com/guangyey, https://github.com/Skylion007	2025-04-08 16:05:06 +00:00
ZhiweiYan-96	52d172eafd	Facilitate at::_weight_int4pack_mm_with_scale_and_zeros related registration (#147962 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147962 Approved by: https://github.com/jerryzh168, https://github.com/guangyey, https://github.com/EikanWang ghstack dependencies: #137566 Co-authored-by: xiaolil1 <xiaoli.liu@intel.com>	2025-04-08 15:36:07 +00:00
FFFrog	05365e380d	Remove torch functions that do not support device arguments from _device_constructor (#150290 ) As the title stated In Addition: - I have checked all the functions in _device_constructor and found ``torch.vander`` also don`t support device arguments - Remove the duplicated function such as torch.ones and torch.asarray Related issue:https://github.com/pytorch/pytorch/issues/150284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150290 Approved by: https://github.com/albanD	2025-04-08 15:13:55 +00:00
FFFrog	a402c2f203	Remove redundant code in cuda/__init__.py (#150529 ) As the title stated. Follow: https://github.com/pytorch/pytorch/pull/147078 Fix issue: https://github.com/pytorch/pytorch/issues/150519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150529 Approved by: https://github.com/eqy	2025-04-08 15:03:21 +00:00
Guilherme Leobas	f3b2fb6c66	Allow trace through unittest (#146500 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146500 Approved by: https://github.com/anijain2305	2025-04-08 14:55:17 +00:00
Luca Wehrstedt	1791b4150b	Clarify behavior of TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK (#150682 ) I still don't really understand the original purpose of that env var, but it appears that its usage is completely disconnected from MemPools and from `ncclMemAlloc`/`Free`. In fact, when that env var is set, we invoke `ncclCommRegister` for _all_ NCCL communicators for _all_ the memory segments managed by the allocator (both the global ones, allocated with `cudaMalloc`, and the ones in private MemPools), and we do that both for the segments that already exist when the PG is initialized and for all segments that will be allocated later. I'm reworking the code a bit, by using a few helper functions, whose name should make this behavior clearer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150682 Approved by: https://github.com/kwen2501 ghstack dependencies: #150681	2025-04-08 13:00:59 +00:00
Luca Wehrstedt	3649e2e7bd	Safer bookkeeping of NCCL communicators (#150681 ) This consists mainly in two changes: - ensure we can reliably obtain the device from a `NCCLComm` object (there was one constructor which didn't set the device) - use a RAII pattern for acquiring the lock to the global dictionary of `NCCLComms` (which ensures the lock is released in case of exceptions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150681 Approved by: https://github.com/kwen2501	2025-04-08 11:12:37 +00:00
FFFrog	3da14d38bd	Fix the Problems About Defining Static Variable in Inline Function (#147095 ) Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations - Remove unused header files - Move the inline function that defines the static variable to .cc Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-04-08 10:23:02 +00:00
FFFrog	881d99495d	Add more check for torch.ormqr (#150759 ) As the title statd. Please refer to https://github.com/pytorch/pytorch/issues/150674 for more info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150759 Approved by: https://github.com/lezcano	2025-04-08 08:26:05 +00:00
FFFrog	f8aa6404ac	Refactor: add initialization of math.lcm into torch_c_binding_in_graph_functions (#150766 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150766 Approved by: https://github.com/aorenste, https://github.com/jansel	2025-04-08 04:12:26 +00:00
zeshengzong	c9c0f8eae3	Add plot for `torch.nn.Threshold` and `torch.nn.GLU` (#150171 ) Fixes #150170 ## Changes - Add plot for `torch.nn.Threshold` and `torch.nn.GLU` - Add example output make them easier get result by users ## Test Result ![image](https://github.com/user-attachments/assets/f6c5bc46-f9b7-4db7-9797-e08d8423d1b3) ![image](https://github.com/user-attachments/assets/ad4e6c84-7b29-44f1-b7bd-9c81e4a92ef8) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150171 Approved by: https://github.com/albanD	2025-04-08 03:55:37 +00:00
zeshengzong	7e11089fe5	Optimize dataloader Self typing (#146816 ) Optimize `dataloader.py` method return type with Self typing Pull Request resolved: https://github.com/pytorch/pytorch/pull/146816 Approved by: https://github.com/albanD	2025-04-08 03:52:23 +00:00
Akash Verma	e9e5682a4a	[ROCm] Build Pytorch extensions with amdclang++ (#150451 ) Here are the following modifications made to cpp_extension.py- 1) Changed compiler flag to use --version. 2) Added a feature to convert alpha-numeric string to numeric string for the version string returned by compiler. This was the source of error as the parser was failing on parsing alpha-numeric version string. Build with following pytorch extensions- Apex, TorchVision, TorchAudio & DeepSpeed. Unit tested with following pytorch extensions- Apex, TorchVision. (cherry picked from commit c873aeac35851a7d5000eb7f24561d3f56c2ffbd) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/150451 Approved by: https://github.com/jeffdaily	2025-04-07 23:31:29 +00:00
Hexin Wang	91173ff89a	Fixing NCCL abort hang issue when a ProcessGroupNCCL manages multiple ncclComms (#150690 ) Detail of the issue: If PyTorch issues send/recv to each 2 rank comm, and these comms are managed by a single ProcessGroupNCCL instance, then comms need to abort either in sequence or in group. I.e. the following sequential abort will cause hang in NCCL. recv(..., comm0, stream); send(..., comm1, stream); abort(comm1); abort(comm0); Fixes #119797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150690 Approved by: https://github.com/kwen2501	2025-04-07 23:20:49 +00:00
Animesh Jain	6ea5514e04	[invoke_subgraph] Lazy backward (#150666 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150666 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2025-04-07 22:44:43 +00:00
Ankita George	78fe079c97	Support having no metadata file for HuggingFaceStorageReader (#150701 ) Summary: If there is only one safetensors file, we don't need users to have a metadata file and we can just construct it from the keys of that file. This is a use-case for some HuggingFace models, so adding support for it Test Plan: ensure existing tests pass tested e2e in a notebook Differential Revision: D72472490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150701 Approved by: https://github.com/joecummings	2025-04-07 22:10:39 +00:00
Max Ren	eba05e2d3e	[AO] Refactor convert and add QuantAffinePlaceholderObserver (#150644 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150644 Approved by: https://github.com/jerryzh168 ghstack dependencies: #150642, #150643	2025-04-07 20:52:45 +00:00
Max Ren	5653fb3525	[AO] Add Moving Average Affine Observer (#150643 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150643 Approved by: https://github.com/jerryzh168 ghstack dependencies: #150642	2025-04-07 20:52:45 +00:00
Max Ren	ed0dea3e24	[AO] update port_metadata_pass to support quant_affine ops (#150642 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150642 Approved by: https://github.com/jerryzh168	2025-04-07 20:52:44 +00:00
PyTorch MergeBot	bf1132c196	Revert "Generalize poison fork logic for each device backend (#144664 )" This reverts commit `d86c14156d`. Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing periodic test: python test/test_cpp_extensions_mtia_backend.py TestCppExtensionMTIABackend.test_device_context ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2784506104))	2025-04-07 20:09:53 +00:00
Pian Pawakapan	f8b53f4a75	[export] raise when Dim.DYNAMIC 0/1 specializes (#150716 ) Previously we didn't catch this, mark_dynamic() just doesn't allocate a symbol for it Differential Revision: D72486930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150716 Approved by: https://github.com/angelayi	2025-04-07 18:58:42 +00:00
Sam Larsen	2a1e2b88ed	[logging] Add pgo remote get/put timings to dynamo_compile (#150322 ) Test Plan: https://fburl.com/scuba/dynamo_compile/sandbox/xf950tw8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150322 Approved by: https://github.com/ppanchalia	2025-04-07 18:08:26 +00:00
Saurabh Mishra	7d2411d30e	[DCP][OSS] Introduce barrier util in the DistWrapper for rank local checkpointing (#150748 ) Summary: Introduce barrier util in the DistWrapper for rank local checkpointing. This barrier will be used at the end of the rank local checkpointing to ensure all ranks synchronize. Test Plan: UTs Differential Revision: D72541431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150748 Approved by: https://github.com/MeetVadakkanchery	2025-04-07 17:33:07 +00:00
Isuru Fernando	957faaadca	Avoid overflow in vector_norm for scalar input (#144073 ) Fixes https://github.com/pytorch/pytorch/issues/143960 where torch.dist gave different results from eager due to vector_norm overflowing and eager mode avoids the overflow for single element reductions by not computing the power and then the root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144073 Approved by: https://github.com/eellison, https://github.com/laithsakka	2025-04-07 17:10:10 +00:00
fduwjj	06e9deabb6	[c10d][fr] Improve FR dump robustness with all watchdog broadcast wait and more frequent store check (#150652 ) When debugging FR missing dump and missing dump logs, I have couple initial findings: 1. On the same rank, if a second watchdog timeout triggers on a different PG(or subPG), that watchdog thread will immediately throw exception instead of sleeping. We want to fix that by still making the watchdog thread to wait for 1 min. 2. The FR dump takes about 900ms to 1200ms so, we are not checking the store frequently enough. But instead of changing the frequency from 1sec to 300ms, we finally decided to just let all ranks just sleep for 1 min universally rather than using a promise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150652 Approved by: https://github.com/kwen2501	2025-04-07 16:33:27 +00:00
shiyang-weng	0ad2c5d7e2	Add RECORD_FUNCTION for AOTI (#150150 ) Only add RECORD_FUNCTION for shim_fn now. Next step need to add RECORD_FUNCTION for all the aoti_torch_* functions. Fixes https://github.com/pytorch/pytorch/issues/148650 Some code gen by aoti ```c++ AtenTensorHandle buf1_handle; AtenTensorHandle buf2_handle; AtenTensorHandle buf3_handle; AtenTensorHandle buf4_handle; {RECORD_FUNCTION("aoti_torch_cpu__embedding_bag", c10::ArrayRef<c10::IValue>());AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu__embedding_bag(L__self___sparse_arch_embedding_bag_collection_embedding_bags_t_cat_0_weight, arg80_1, arg81_1, 0, 0L, 0, nullptr, 1, -1L, &buf1_handle, &buf2_handle, &buf3_handle, &buf4_handle));} RAIIAtenTensorHandle buf1(buf1_handle); RAIIAtenTensorHandle buf2(buf2_handle); RAIIAtenTensorHandle buf3(buf3_handle); RAIIAtenTensorHandle buf4(buf4_handle); arg80_1.reset(); arg81_1.reset(); ``` On trace ``` { "name": "aoti_torch_cpu__embedding_bag", "ph": "X", "ts": 68874.450000, "dur": 361.291000, "tid": 2, "pid": "CPU Functions", "args": {} }, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150150 Approved by: https://github.com/desertfire, https://github.com/EikanWang	2025-04-07 15:12:29 +00:00
Benjamin Glass	f813d64f54	cpp_wrapper: Fix even more tests (#147225 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147225 Approved by: https://github.com/desertfire ghstack dependencies: #150671, #150672	2025-04-07 14:20:06 +00:00
Benjamin Glass	5e3c8214b5	cpp_wrapper: Re-enable code disabled for forward compatibility (#150671 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150671 Approved by: https://github.com/desertfire	2025-04-07 14:20:06 +00:00
Shivam Raikundalia	99c9a31386	[submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559 ) Summary: Profiler side of memory snapshot. 1. Add API to actually do snapshot when client interface is called 2. Add ifdefs to builds so that kineto hooks snapshot correctly. Design Philosophy: There is one interesting part of this implementation and it is during export. For export we are callign the python impl of the export rather than CPP even though we are already in CPP. This is because it is better to simply have one path of export rather than 2. Personally, I want there to be parity between auto-trace and on-demand so it if we can limit the side paths then we will have an easier time maintaining this relationship Test Plan: {F1976563426} Reviewed By: sanrise Differential Revision: D70733247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150559 Approved by: https://github.com/sanrise	2025-04-07 13:04:38 +00:00
Zain Huda	e209625334	[torchrec] update local_shards_wrapper to latest version (#150469 ) Summary: Adding new ops, support for empty shards, and fixed initializations for downstream checkpointing. Test Plan: buck2 run 'fbcode//mode/dev-nosan' fbcode//torchrec/distributed/tests:test_shards_wrapper Differential Revision: D72271275 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150469 Approved by: https://github.com/XilunWu	2025-04-07 13:00:52 +00:00
Zhengxu Chen	24aadb40fb	[precompile] Serialization for GlobalStateGuard (#150636 ) Summary: To preserve global state guards we need to make the C++ type serialzable. Using json because it's easier to do and we don't have a lot of data in global state. Test Plan: test_dynamo -k test_global_state_guard_serialization Differential Revision: D72410611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150636 Approved by: https://github.com/williamwen42	2025-04-07 03:10:03 +00:00
eellison	b6929aef08	Fix conv2d strided prologue (#150697 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150697 Approved by: https://github.com/drisspg	2025-04-07 02:26:58 +00:00
Yu, Guangye	d86c14156d	Generalize poison fork logic for each device backend (#144664 ) # Motivation Generalize the posion_fork code to make it reusable across different devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-04-07 02:06:21 +00:00
Han, Chao1	d98575806b	Generalize compile collective to avoid cuda-bias (#150405 ) Fixes https://github.com/intel/torch-xpu-ops/issues/1527 Let the combination of `compile` and `collective` to support more devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150405 Approved by: https://github.com/guangyey, https://github.com/jansel Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-04-07 01:54:20 +00:00
Paul Ganssle	47b494ef69	Add type hints to `_tensor_docs.add_docstr_all` (#150715 ) There is some sort of bug in `pytype` where if this function doesn't have type hints, `pytype` will spend 10 minutes inferring the types. Not that this matters much for a project not using `pytype`, but it led me to realize that this function could easily be type hinted and is not, so here is a PR adding some type hints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150715 Approved by: https://github.com/Skylion007	2025-04-06 22:25:34 +00:00
Randolf Scholz	6c38b9be73	[typing] Add type hints to `__init__` methods in `torch.distributions`. (#144197 ) Fixes #144196 Extends #144106 and #144110 ## Open Problems: - [ ] Annotating with `numbers.Number` is a bad idea, should consider using `float`, `SupportsFloat` or some `Procotol`. https://github.com/pytorch/pytorch/pull/144197#discussion_r1903324769 # Notes - `beta.py`: needed to add `type: ignore` since `broadcast_all` is untyped. - `categorical.py`: converted `else` branches of mutually exclusive arguments to `if` branch[^2]. - ~~`dirichlet.py`: replaced `axis` with `dim` arguments.~~ #144402 - `gemoetric.py`: converted `else` branches of mutually exclusive arguments to `if` branch[^2]. - ~~`independent.py`: fixed bug in `Independent.__init__` where `tuple[int, ...]` could be passed to `Distribution.__init__` instead of `torch.Size`.~~ EDIT: turns out the bug is related to typing of `torch.Size`. #144218 - `independent.py`: made `Independent` a generic class of its base distribution. - `multivariate_normal.py`: converted `else` branches of mutually exclusive arguments to `if` branch[^2]. - `relaxed_bernoulli.py`: added class-level type hint for `base_dist`. - `relaxed_categorical.py`: added class-level type hint for `base_dist`. - ~~`transforms.py`: Added missing argument to docstring of `ReshapeTransform`~~ #144401 - ~~`transforms.py`: Fixed bug in `AffineTransform.sign` (could return `Tensor` instead of `int`).~~ #144400 - `transforms.py`: Added `type: ignore` comments to `AffineTransform.log_abs_det_jacobian`[^1]; replaced `torch.abs(scale)` with `scale.abs()`. - `transforms.py`: Added `type: ignore` comments to `AffineTransform.__eq__`[^1]. - `transforms.py`: Fixed type hint on `CumulativeDistributionTransform.domain`. Note that this is still an LSP violation, because `Transform.domain` is defined as `Constraint`, but `Distribution.domain` is defined as `Optional[Constraint]`. - skipped: `constraints.py`, `constraints_registry.py`, `kl.py`, `utils.py`, `exp_family.py`, `__init__.py`. ## Remark `TransformedDistribution`: `__init__` uses the check `if reinterpreted_batch_ndims > 0:`, which can lead to the creation of `Independent` distributions with only 1 component. This results in awkward code like `base_dist.base_dist` in `LogisticNormal`. ```python import torch from torch.distributions import * b1 = Normal(torch.tensor([0.0]), torch.tensor([1.0])) b2 = MultivariateNormal(torch.tensor([0.0]), torch.eye(1)) t = StickBreakingTransform() d1 = TransformedDistribution(b1, t) d2 = TransformedDistribution(b2, t) print(d1.base_dist) # Independent with 1 dimension print(d2.base_dist) # MultivariateNormal ``` One could consider changing this to `if reinterpreted_batch_ndims > 1:`. [^1]: Usage of `isinstance(value, numbers.Real)` leads to problems with static typing, as the `numbers` module is not supported by `mypy` (see <https://github.com/python/mypy/issues/3186>). This results in us having to add type-ignore comments in several places [^2]: Otherwise, we would have to add a bunch of `type: ignore` comments to make `mypy` happy, as it isn't able to perform the type narrowing. Ideally, such code should be replaced with structural pattern matching once support for Python 3.9 is dropped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144197 Approved by: https://github.com/malfet Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-04-06 17:50:35 +00:00
Isalia20	49f6cce736	[MPS] grad scaler (#150255 ) Fixes #142397 Basic implementation is done. What's left: - [x] Different dtype/device tensors in the TensorList - [x] fast path for grouping the foreach kernel - [x] Tests Regarding tests, I found some tests in `test/test_torch.py` for GradScaler but I couldn't figure out what is the best way to enable the test for MPS device. By removing `@onlyNativeDeviceTypes`, one enables the tests for MPS but also enables tests for all other devices which are not included in the native device types. If I put: `instantiate_device_type_tests(TestTorchDeviceType, globals(), allow_mps=True)` This enables lots of tests in that class for MPS which were not(?) being tested before? This part needs some clarification Pull Request resolved: https://github.com/pytorch/pytorch/pull/150255 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-04-06 17:06:55 +00:00
Natalia Gimelshein	55e62ff74a	bf16 grouped gemm (#150374 ) Enabled bf16 grouped gemm with an API similar to _scaled_group_gemm, except without scale and fast accum arguments. All transpose variants are enabled, unlike scaled gemm. Ideally we'd factor out a lot more code from scaled gemm, currently there's a lot of repetition between scaled and non-scaled versions. I factored out only a helper kernel that prepares arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150374 Approved by: https://github.com/drisspg	2025-04-06 04:53:24 +00:00
PyTorch MergeBot	caf8d9bc17	Revert "Fix conv2d strided prologue (#150697 )" This reverts commit `2e4ae2ab41`. Reverted https://github.com/pytorch/pytorch/pull/150697 on behalf of https://github.com/ngimel due to breaks rocm build ([comment](https://github.com/pytorch/pytorch/pull/150697#issuecomment-2781218658))	2025-04-06 04:50:15 +00:00
Klint Qinami	2d98a1caf5	[MTIA] Map names to operand indices when folding submodules (#150692 ) When replacing placeholders with getattrs during constant folding, we can have an argument and parameter name mismatch. In fact, there is no guarantee that the parameter name is equivalent to the argument name used in the module call. Differential Revision: D72415970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150692 Approved by: https://github.com/jfix71	2025-04-06 03:11:14 +00:00
Nikita Shulga	c830c12a87	[MPSInductor] Fix tiled reduction logic (#150737 ) In case of tiles, index must include both reduction dimentions Pull Request resolved: https://github.com/pytorch/pytorch/pull/150737 Approved by: https://github.com/dcci	2025-04-06 00:20:41 +00:00
Mu-Chu Lee	60a45eb862	[AOTInductor] Introduce MaybeOwningAtenTensorHandle for ConstantMap (#150275 ) Summary: We used RAIIAtenTensorHandle for ConstantMap, where RAIIAtenTensorHandle is a unique_ptr, indicating that all memory handling is by the AOTInductor internally. In this PR, we introduce ConstantAtenTensorHandle which replaces RAIIATenTensorHandle. This class holds a raw AtenTensorHandle, and also owns a RAIIAtenTensorHandle if user decides to delegate memory management to AOTInductor. This is a prerequisite for user managed buffer, this PR, however only introduces this class and make sure it works with existing AOTInductor and has the default behavior identical as using RAIIAtenTensorHandle. Test Plan: Existing tests. No change should be introduced within this PR. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/150275 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2025-04-05 06:00:35 +00:00

1 2 3 4 5 ...

47312 Commits