pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
maajidkhann	84c21d2147	Enable SVE ACLE implementation for tanH Aten op for FP32 dType. (#143741 ) In deep learning models, the tanh (hyperbolic tangent) function is a widely used activation function, primarily in feedforward networks, recurrent neural networks (RNNs), and various other architectures. Also, the tanh (hyperbolic tangent) function is commonly used in Physics-Informed Neural Networks (PINNs). PINNs are a class of machine learning models designed to solve partial differential equations (PDEs) by incorporating the governing physics directly into the loss function, along with data-driven terms. In PINNs, activation functions like tanh are used in the neural network architecture to enable the model to learn complex mappings between inputs (such as spatial and temporal coordinates) and outputs (such as field variables). Operator: tanh() Current Implementation in OSS in ATen Backend: SVE Flow: Uses SVE sleef when available else std implementation. With this PR : SVE Flow: Uses SVE ACLE implementation. (Faster Implementation) Here are the performance improvements. Single core perf numbers: ![image](https://github.com/user-attachments/assets/c2f4bcb6-11bc-4af1-b5eb-278a4cc4a69d) Metric: CPU time avg time per iteration (In ms) As you can see with both gcc and clang compilers, we see a significant performance gain with SVE ACLE implementation over current OSS Implementation (Sleef) and also Neon. Hardware: m7g.8xlarge (Graviton 3 Instance) Script used in benchmarking: ```python import os #os.environ["ATEN_CPU_CAPABILITY"] = "default" os.environ["ATEN_CPU_CAPABILITY"] = "sve256" import torch import torch.nn as nn #Set the random seed for reproducibility torch.manual_seed(1) #Create a tensor of shape (8521, 50) x = torch.randn(8521, 50) for i in range(10): output = x.tanh() #Perform the tanh operation 1000 times and profile the performance print("### CPU tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(1000): output = x.tanh() #Print the profiling results sorted by self CPU time print(prof.key_averages().table(sort_by="self_cpu_time_total")) #Optionally print the final output (if needed, uncomment the following line) print(output) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143741 Approved by: https://github.com/malfet	2025-04-01 11:54:58 +00:00
yucai-intel	bf4814eb6a	[Intel GPU] Allow XPU backend in Quantize operators (#150288 ) This modification is to support torch.quantize_per_channel() on XPU, otherwise it will cause a segmentation fault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150288 Approved by: https://github.com/jerryzh168, https://github.com/guangyey	2025-04-01 11:27:26 +00:00
Xuehai Pan	a10b765bf1	[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257 ) Changes in this PR: 1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence. 2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types. 3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class. Resolves #75982. New tests are included in this PR. - #75982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257 Approved by: https://github.com/zou3519	2025-04-01 10:40:43 +00:00
Prajesh Praveen Anchalia	48e9ffc873	Unify on dynamo_compile as the overall wait counter (#150293 ) Summary: dynamo_compile for the most part has been accounting for compile time except autotuning. all_compilation_types had earlier been injected on fx_codegen_and_compile, which was incorrect. Add autotuining to dynamo and deprcate all_compilation_types counter. Differential Revision: D72145447 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150293 Approved by: https://github.com/masnesral, https://github.com/jamesjwu	2025-04-01 08:55:51 +00:00
FFFrog	36f2d0aaba	Add "xpu" to __all__ for torch/version.py (#149695 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149695 Approved by: https://github.com/desertfire, https://github.com/guangyey	2025-04-01 08:44:51 +00:00
Natalia Gimelshein	1700599266	Add one_shot_all_reduce_copy to allow non-symm-mem allocated tensors to be reduced (#150129 ) Per title, we want to be able to use it even if inputs are not registered. Separate copy would add latency, and one-shot is all about the lowest possible latency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150129 Approved by: https://github.com/xw285cornell	2025-04-01 05:36:43 +00:00
Natalia Gimelshein	414b9ae016	enable out variant of 2-shot reduction (#150153 ) Per title, this version uses symm mem input both as input source and as a work buffer, so input is modified after the end (similar to what fbgemm car reduction does). It is intended to be wrapped in an op that would first copy the real inputs to symm mem buffers that wouldn't be exposed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150153 Approved by: https://github.com/xw285cornell	2025-04-01 05:36:04 +00:00
Tugsbayasgalan Manlaibaatar	7e7e5698cc	Suppress more warnings (#149833 ) Differential Revision: [D71702307](https://our.internmc.facebook.com/intern/diff/D71702307) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149833 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-04-01 05:33:04 +00:00
William Wen	790d459f85	[dynamo] add error message for unsupported LOAD_BUILD_CLASS (#150323 ) Improved error message for https://github.com/pytorch/pytorch/issues/128942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150323 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-04-01 05:03:50 +00:00
Stonepia	ce52674b76	[Doc] Update CMAKE_PREFIX_PATH for XPU windows README (#148863 ) We found that the `pip install cmake` and `conda install cmake` has different behavior. The reason is that the pip installed one doesn't find the corresponding libs under conda env. So we need to set the `CMAKE_PREFIX_PATH` for alignment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148863 Approved by: https://github.com/CuiYifeng, https://github.com/malfet Co-authored-by: Cui, Yifeng <yifeng.cui@intel.com>	2025-04-01 04:43:11 +00:00
Phillip Liu	31634b8c6a	[fr] Added protection against missing stack frames in fr cont. (#150133 ) Summary: Previously we had D70358287, which didn't fully resolved the issue. Test Plan: # FR `buck2 run @//mode/opt //caffe2/fb/flight_recorder:fr_trace -- --mast_job_id f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0 --bucket tlcm_log_blob --world_size 128 --dump_file_name_offset 0 --allow-incomplete-ranks` Confirm no error # FR analyzer `buck2 run @//mode/opt //investigations/dr_patternson/analyzers/ai_observability:ai_observability-all-analyzers-cli -- flight_recorder_analyzer --mast_job_name f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0` Confirm no error Differential Revision: D71998980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150133 Approved by: https://github.com/fduwjj	2025-04-01 03:07:59 +00:00
Nikita Shulga	827b730f4e	[CI] Skip test_copy_large_tensor on M2-15 runners (#150377 ) They have more than 12Gb memory, but may be running this test causes OOM in CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/150377 Approved by: https://github.com/atalman	2025-04-01 02:33:43 +00:00
Nikita Shulga	6470b373c1	`torch.backends.mkldnn.flags()` CM should not warn (#150358 ) By returning `None` rather than `False` from `THPModule_allowTF32OneDNN` when USE_XPU is not defined Added regression test Fixes https://github.com/pytorch/pytorch/issues/149829 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/150358 Approved by: https://github.com/atalman	2025-04-01 01:33:40 +00:00
Sun, Jiayi	5cb5675f13	[Inductor] optimize the heuristics of parallel reduction (#149614 ) Fix https://github.com/pytorch/pytorch/issues/148639. Summary: Optimize the heuristics of parallel reduction: When the number of steps of the first inner loop beyond the maximum parallel depth is much larger than the number of steps of all outer loops within the maximum parallel depth, change the starting depth of parallelism to the first inner loop and recalculate the maximum parallel depth. I ran the Inductor benchmark with this PR on CPU. A timm model poolformer_m36 BF16 has about 25% performance improvement, and no performance regression is seen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149614 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel	2025-04-01 01:31:00 +00:00
Zhang, Jianyi	0f12951fc2	[Intel gpu] always set deterministic for xpu accuracy test (#149028 ) On Intel Max 1550, models like Super_SloMo can actually pass accuracy test after set deterministic, because we do not use atomic in upsampling bilinear backward in some cases when running on XPU. Furthermore, I guess the only reason not to set deterministic on these models is just avoiding errors. We should use warn_only = True. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149028 Approved by: https://github.com/guangyey, https://github.com/desertfire Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-04-01 01:00:11 +00:00
Nikita Shulga	7ab8532cf1	[BE] Get rid of cross-compile and x86 build options for Mac (#150362 ) As both cross-compilation and x86 builds has been removed a while back Remove stale TODO about building with OpenMP support Pull Request resolved: https://github.com/pytorch/pytorch/pull/150362 Approved by: https://github.com/atalman, https://github.com/clee2000	2025-04-01 00:45:24 +00:00
Joshua Hamilton	4ce0b959ff	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/malfet Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-04-01 00:42:46 +00:00
Jack Taylor	49b7d0d84d	[ROCm] Enable more inductor UTs (#149513 ) Primarily enable inductor fp8 tests, also enable other inductor tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/149513 Approved by: https://github.com/jeffdaily	2025-04-01 00:30:36 +00:00
Nikita Shulga	c75dac5f5c	Fix typo (#150363 ) Fixes https://github.com/pytorch/pytorch/issues/150339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150363 Approved by: https://github.com/atalman, https://github.com/kwen2501	2025-03-31 23:58:37 +00:00
Davide Italiano	b48505a8a1	[MPS] Add support for hermite_polynomial_h. (#150279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150279 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-03-31 23:30:19 +00:00
Mu-Chu Lee	a2070e2fd5	[AOTInductor] Free tensors in test (#150274 ) Summary: This PR frees tensor that were new-ed within the test itself to prevent memory leak. Test Plan: Fixing tests itself. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/150274 Approved by: https://github.com/chenyang78	2025-03-31 23:28:13 +00:00
Shiyan Deng	982a7f7db0	[cachinghostallocator] remove the check on cudaHostRegister path (#150070 ) Summary: In the cudaHostAlloc path, the flag we used is `cudaHostAllocDefault` [0] which don't really have this strict enforcement (devicePtr retrieved from ` cudaHostGetDevicePointer(()` point to the same addr as the hostPtr) according to the guide [1]. This diff removes the check so that the host register path works for ROCm. [0]`6aca002d82/aten/src/ATen/cuda/CachingHostAllocator.cpp (L97)` [1] https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gb65da58f444e7230d3322b6126bb4902 Test Plan: test_pinned_memory_with_cudaregister tests Differential Revision: D71932562 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150070 Approved by: https://github.com/jeffdaily	2025-03-31 23:23:05 +00:00
PaulZhang12	981048854d	Merge Triton ScaledMM as epilogue to MM template (#150045 ) Previously, scaled_mm's (FP8 matmul) Triton lowering for inductor was in a separate template. This PR consolidates that lowering into the mm template, with an added epilogue to deal with multiplying the scales. This paves the way for future scaled variants of BMM, Grouped GEMM in inductor. Currently, there is still a separate template for TMA+persistent version of scaled_mm. The current mm lowering has a separate template for TMA + Persistent version. Will hopefully consolidate the extra scaled_mm TMA+persistent template when the consolidation for the mm template is done. TODO: Consolidate TMA+Persistent logic into 1 template and remove separate scaled_mm TMA template Pull Request resolved: https://github.com/pytorch/pytorch/pull/150045 Approved by: https://github.com/drisspg	2025-03-31 23:20:14 +00:00
Nikita Shulga	91666eef60	Update gloo submodule (#150320 ) That updates its CMake minimum version(via https://github.com/facebookincubator/gloo/pull/424 ) and removes cmake-4.0.0 workarounds for gloo Pull Request resolved: https://github.com/pytorch/pytorch/pull/150320 Approved by: https://github.com/atalman	2025-03-31 22:40:27 +00:00
PyTorch MergeBot	1526ff955e	Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 )" This reverts commit `515b45e569`. Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/clee2000 due to failing internal tests D72135661 ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2767531682))	2025-03-31 22:19:08 +00:00
Faa Diallo	423e4a4568	[ROCm] cmake 4 workaround for hiprtc (#150324 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150324 Approved by: https://github.com/jeffdaily, https://github.com/atalman, https://github.com/malfet	2025-03-31 21:55:53 +00:00
Ethan Wee	4e2997db73	[ROCm][CI] Increase wheel build timeout from 210 to 240 (#150221 ) Fixes #150046. Increasing the timeout from 210 to 240. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150221 Approved by: https://github.com/jeffdaily	2025-03-31 21:46:09 +00:00
Pian Pawakapan	925fd4aa2e	[export] min/max ranges for dim hints (#149590 ) Differential Revision: D71522032 Adds min/max ranges to Dim.AUTO/DYNAMIC/STATIC, so users can do `Dim.AUTO(min=2, max=2048)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149590 Approved by: https://github.com/tugsbayasgalan	2025-03-31 21:32:20 +00:00
Eli Uriegas	dfcd98e684	cd: Fix naming for windows arm64 libtorch builds (#150310 ) Apparently the magical incantation to name these correctly lies in the build_variant variable otherwise it silently does nothing. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/150310 Approved by: https://github.com/atalman	2025-03-31 20:12:03 +00:00
Matthew Haddock	80b7f6b704	Adjust TestInductorOpInfo to depend on backend, not device (#146911 ) As is the case with many inductor tests, this test adapts test criteria based on device type, where it should be adjusting for the backend registered for that device. In this particular case, using the upstream triton CPU backend would lead to failures, as reference_in_float would be true as this is required for the C++/OpenMP backend which does not have float16 support. However most triton backends do, and as such should be tested in float16. Similarly a triton backend with a device not described as a GPU would get skipped from testing entirely. A more generic solution would be ideal, but this would require a lot of work across many tests. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146911 Approved by: https://github.com/masnesral	2025-03-31 18:24:16 +00:00
Aleksei Nikiforov	ab342d3793	Make PyTorch buildable by CMake-4.x on s390x (#150294 ) This is a continuation of https://github.com/pytorch/pytorch/pull/150203 that fixes nightly build on s390x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150294 Approved by: https://github.com/malfet	2025-03-31 18:10:02 +00:00
angelayi	5e34758cef	[invoke_subgraph] Support unbacked (#149298 ) Differential Revision: [D71420641](https://our.internmc.facebook.com/intern/diff/D71420641) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149298 Approved by: https://github.com/zou3519	2025-03-31 17:25:09 +00:00
Pian Pawakapan	284b766898	[dynamic shapes] C++ bindings for guard_or_false/true (#150148 ) C++ version. Would like to add it in one place to prove it works, but couldn't find one that doesn't expose a chain of data-dependent changes... so just gonna put up the base implementation Pull Request resolved: https://github.com/pytorch/pytorch/pull/150148 Approved by: https://github.com/laithsakka, https://github.com/jingsh	2025-03-31 17:04:25 +00:00
Prachi Gupta	47cdad2995	[ROCm] Enable several fsdp related UTs (#149369 ) Enabling 26 UTs for ROCm in the following files: - distributed._shard.sharded_optim.test_sharded_optim - 2 UTs - distributed._shard.sharded_tensor.ops.test_binary_cmp - 4 UTs - distributed._shard.sharded_tensor.ops.test_init - 3 UTs - distributed._shard.sharded_tensor.ops.test_embedding - 2 UTs - distributed._shard.sharded_tensor.ops.test_embedding_bag - 2 UTs - distributed._composable.test_replicate_with_compiler - 4 UTs - distributed._composable.fsdp.test_fully_shard_grad_scaler - 1 UTs - distributed.tensor.test_attention - 4 UTs - distributed.tensor.test_matrix_ops - 1 UTs - distributed.tensor.test_tensor_ops - 1 UTs - distributed.fsdp.test_fsdp_grad_acc - 2 UTs Pull Request resolved: https://github.com/pytorch/pytorch/pull/149369 Approved by: https://github.com/jeffdaily	2025-03-31 16:15:57 +00:00
PyTorch MergeBot	7c858066ae	Revert "Enable TMA persistent GEMM Template by default (#149427 )" This reverts commit `b8ef642f04`. Reverted https://github.com/pytorch/pytorch/pull/149427 on behalf of https://github.com/clee2000 due to failing tests internally D72116141 ([comment](https://github.com/pytorch/pytorch/pull/149427#issuecomment-2766672200))	2025-03-31 15:58:34 +00:00
PyTorch MergeBot	57fa99c5c3	Revert "enable out variant of 2-shot reduction (#150153 )" This reverts commit `cdeb32d2d1`. Reverted https://github.com/pytorch/pytorch/pull/150153 on behalf of https://github.com/clee2000 due to failing internal builds D72083877 ([comment](https://github.com/pytorch/pytorch/pull/150153#issuecomment-2766633712))	2025-03-31 15:43:24 +00:00
PyTorch MergeBot	e57fa18b40	Revert "Add one_shot_all_reduce_copy to allow non-symm-mem allocated tensors to be reduced (#150129 )" This reverts commit `8a872261dc`. Reverted https://github.com/pytorch/pytorch/pull/150129 on behalf of https://github.com/clee2000 due to breaking internal builds D72080428 ([comment](https://github.com/pytorch/pytorch/pull/150129#issuecomment-2766619006))	2025-03-31 15:37:54 +00:00
Wang, Chuanqi	f74d5d576a	Update torch-xpu-ops commit pin to 3ee2bd2 (#150300 ) Update the torch-xpu-ops commit to [3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5](`3ee2bd2f13`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150300 Approved by: https://github.com/EikanWang	2025-03-31 13:36:11 +00:00
Yichen Yan	bbb9b2476b	Unify use of `enableCollectiveHashDebug_` and trivial updates (#142865 ) Use `enableCollectiveHashDebug_` instead of checking env ad-hoc when `TORCH_DISTRIBUTED_DEBUG = DETAIL` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142865 Approved by: https://github.com/fegin, https://github.com/kwen2501	2025-03-31 12:23:30 +00:00
Ethan Wee	c158eac0de	[ROCm] use correct workspace for hipblaslt, silence warning (#150227 ) Follow up to #145130. That PR caused a warning on ROCm the first time hipblaslt was called for any workload, always. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/150227 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-03-31 09:49:43 +00:00
LifengWang	51f0403f46	Update the baseline for max_autotune ci workflow (#149107 ) Since the issue https://github.com/pytorch/pytorch/issues/148535 is fixed in PR https://github.com/pytorch/pytorch/pull/148923, update the baseline for max_autotune ci workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149107 Approved by: https://github.com/chuanqi129, https://github.com/leslie-fang-intel, https://github.com/desertfire	2025-03-31 09:45:44 +00:00
Kavya Govindarajan	4aded85e79	Fix space typo in warning message (#143473 ) Warning shows up like this (no space between willbe): ``` /home/xxx/.local/lib/python3.11/site-packages/torch/distributed/fsdp/_state_dict_utils.py:827: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143473 Approved by: https://github.com/mikaylagawarecki, https://github.com/kwen2501	2025-03-31 07:38:02 +00:00
Matthew Hoffman	c976321541	Use variadic length tuple for `torch.masked.DimOrDims` (#149870 ) `tuple[int]` means only a tuple of length 1, which is not what was intended. ```python loss = torch.masked.mean(loss, mask=mask, dim=(-1, -2)) # Argument of type "tuple[Literal[-1], Literal[-2]]" cannot be assigned to parameter "dim" of type "DimOrDims" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149870 Approved by: https://github.com/Skylion007	2025-03-31 07:06:58 +00:00
Vlad K	f1b74037b1	Fix bug when Inductor include path contains spaces (#148271 ) This PR fixes a bug with how include directories with spaces are handled on Windows. I ran into an edge case with torch.compile() - it will error out with an exception on Windows. In particular, it will try to execute the following: `cl /I C:/Program Files/Python311/Include ...`, where `C:/Program` will be treated as separate from `Files/Python311/Include`. I looked into using something like `shlex.quote` or `pathlib.Path`, but I didn't find those options to be suitable (shlex is POSIX shell only, pathlib.Path does not escape spaces). There is another place in the function that also deals with escaping spaces. My fix follows the same style. `0ff2e6a85a/torch/_inductor/cpp_builder.py (L1464)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148271 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-03-31 06:46:05 +00:00
Youseok Yang	b99e0c5412	Fix mtia_extension.cpp setDevice() to correctly set current_device (#149398 ) We referred to this code and found that there was a minor bug. Fix for future reference for others. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149398 Approved by: https://github.com/janeyx99	2025-03-31 06:07:22 +00:00
Yuanhao Ji	4f14224dc8	[Inductor] Fix `torch.polygamma()` when n == 1 (#147453 ) Fixes #147450 Be consistent with cpu kernel: `77dbd28535/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp (L433-L444)` Got this in the case: ``` Eager: tensor([1.2914e+15]), dtype: torch.float32 Compile: tensor([1.2914e+15]), dtype: torch.float32 Expected: tensor([6.5808e+32], dtype=torch.float64), dtype: torch.float64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147453 Approved by: https://github.com/eellison	2025-03-31 05:27:46 +00:00
fduwjj	9456738edf	[c10d][fr] Allow multiple writer registration with warnings (#150232 ) The life span of writer is actually the whole program which is sub-optimal but it is a practical compromise so that the registration of writer can happen outside PG creation. So we decide to allow multiple writer registrations with warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150232 Approved by: https://github.com/d4l3k, https://github.com/kwen2501	2025-03-31 04:43:43 +00:00
redwrasse	ad54b3aae2	test 0-dim squeeze in basic.TestSqueeze (#147928 ) Replace TODO with 0-dim squeeze, checks scalar is unchanged in `basic.TestSqueeze` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147928 Approved by: https://github.com/janeyx99	2025-03-31 04:35:16 +00:00
Luca Arnaboldi	c3bb174bb2	SubsetRandomSampler - changed iteration over tensor to iteration over list (#149126 ) Digging further the problem at https://github.com/UKPLab/sentence-transformers/pull/3261, it boils down to this expensive loop over a torch tensor. Looping over a list, like in RandomSampler, solves the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149126 Approved by: https://github.com/divyanshk, https://github.com/cyyever	2025-03-31 04:33:35 +00:00
dscamiss	59abb8c7a2	Fix documentation build errors caused by unsupported section titles (#150205 ) Fixes #150134 Build with `make html` looks OK now: ```shell reading sources... [100%] torch.compiler_get_started .. xpu looking for now-outdated files... none found pickling environment... done checking consistency... done preparing documents... done writing output... [ 80%] generated/torch.nn.Softsign .. generated/torch.nn.modules.module.register_module_full_backward_writing output... [ 86%] generated/torch.nn.modules.module.register_module_module_registration_hook .. generated/torch.rwriting output... [100%] generated/torch.xpu.get_rng_state .. xpu generating indices... genindex done highlighting module code... [100%] typing writing additional pages... search done copying images... [100%] _static/img/torch_cuda_memory/allocator_state_history.png copying static files... done copying extra files... done dumping search index in English (code: en)... done dumping object inventory... done build succeeded. The HTML pages are in build/html. ``` New rendering looks like this: ![image](https://github.com/user-attachments/assets/af7e23a5-9dfd-4cb6-9333-a9e8cfe47ea0) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150205 Approved by: https://github.com/albanD	2025-03-31 04:27:44 +00:00

1 2 3 4 5 ...

86102 Commits