pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Saurabh Mishra	381d0cb239	[DCP] Avoid in-place update and deepcopy during dudpe (#149320 ) Summary: Avoid in-place update and deepcopy during dudpe. Deepcopy becomes prohibitively expensive with models having a huge number of FQNs. This was manifestd in the Ads 2K experiment as well. Here are the results from the TextRay model in Mitra: #### Control job with deepcopy regression: First save ~24.8s Global step latency is ~7-8s Test job with the new fix to avoid deepcopy: First save is ~21s global step latency ~2s Test Plan: ``` buck test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/distributed/checkpoint:test_planner ``` https://www.internalfb.com/intern/testinfra/testrun/3940649945104822 Differential Revision: D71245218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149320 Approved by: https://github.com/MeetVadakkanchery	2025-03-18 16:08:40 +00:00
Nikita Shulga	c41196a4d0	[EZ][Docker] Remove `install_db.sh` (#149360 ) Which is a vestige of caffe2 days and was no-op since https://github.com/pytorch/pytorch/pull/125092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149360 Approved by: https://github.com/atalman, https://github.com/cyyever, https://github.com/seemethere, https://github.com/Skylion007	2025-03-18 16:07:47 +00:00
Justin Chu	fdacf3c920	[ONNX] Update types in VerificationInfo (#149377 ) torch.types.Number was rendered as is in the documentation and can be confusing. We write the original types instead to reduce confusion for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149377 Approved by: https://github.com/titaiwangms	2025-03-18 15:37:39 +00:00
PyTorch MergeBot	405025778d	Revert "[AOTI] Update test runner to use the new APIs (#147105 )" This reverts commit `9a78513c3c`. Reverted https://github.com/pytorch/pytorch/pull/147105 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147105#issuecomment-2733656413))	2025-03-18 15:25:40 +00:00
PyTorch MergeBot	5ba437fb45	Revert "[AOTI] Forward fix unit test failures (#149401 )" This reverts commit `ec9e11145e`. Reverted https://github.com/pytorch/pytorch/pull/149401 on behalf of https://github.com/desertfire due to reverting the original PR instead ([comment](https://github.com/pytorch/pytorch/pull/149401#issuecomment-2733633516))	2025-03-18 15:18:48 +00:00
Pat Vignola	213eea216a	[MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340 ) Summary: The FlexAttention path uses `_maybe_exchange_device`, so it will be needed eventually for MTIA as well. Test Plan: `buck2 test fbcode//mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- test_maybe_exchange_device` Reviewed By: chaos5958 Differential Revision: D70072063 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149340 Approved by: https://github.com/chaos5958	2025-03-18 15:15:12 +00:00
Bin Bao	ec9e11145e	[AOTI] Forward fix unit test failures (#149401 ) Summary: There is a land conflict between https://github.com/pytorch/pytorch/pull/149161 and https://github.com/pytorch/pytorch/pull/147105. We just need to update the APIs used in two new unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149401 Approved by: https://github.com/ZainRizvi	2025-03-18 15:02:01 +00:00
atalman	6e2b2660b9	Make numpy check optional (#149356 ) We may want to skip numpy smoke tests. Hence making it optional Pull Request resolved: https://github.com/pytorch/pytorch/pull/149356 Approved by: https://github.com/ZainRizvi	2025-03-18 15:00:01 +00:00
Andrey Talman	bc88f6faa1	Use TorchVersion for triton version check (#149136 ) Followup after https://github.com/pytorch/pytorch/pull/149092#issuecomment-2721990321 To use TorchVersion for triton version parsing Pull Request resolved: https://github.com/pytorch/pytorch/pull/149136 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-03-18 13:48:46 +00:00
Jithun Nair	b06b5c3e27	[ROCm] Use alternate mirror for drm repo (#149380 ) Fixes issue with building ROCm manywheel and libtorch images eg. https://github.com/pytorch/pytorch/actions/runs/13887711267/job/38854659005#step:4:8328 ``` #53 2.832 Cloning into 'drm'... #53 2.849 fatal: unable to access 'https://gitlab.freedesktop.org/mesa/drm.git/': The requested URL returned error: 503 #53 2.851 ./install_rocm_drm.sh: line 29: pushd: drm: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149380 Approved by: https://github.com/jeffdaily	2025-03-18 13:33:25 +00:00
Laith Sakka	6055a4f612	refresh benchmarks results. (#149347 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149347 Approved by: https://github.com/jamesjwu	2025-03-18 08:53:49 +00:00
Francisco Massa	9b92828d4b	Add batch dim sharding rule to sdpa (#149253 ) This is a trivial rule that for most cases isn't needed, but if we want to consider that the input data is actually `Shard(0)` (instead of `Replicated()` as it is currently assumed), then we need this rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149253 Approved by: https://github.com/XilunWu	2025-03-18 07:54:02 +00:00
Davide Italiano	9cd52da45c	[MPS/inductor] Add support for `modified_bessel_i1`. (#149379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149379 Approved by: https://github.com/malfet	2025-03-18 06:02:33 +00:00
Fadi Arafeh	6c2db8fab0	Enable qint8 and quint8 add for AArch64 using ACL directly (#148653 ) This enables qint8 and quint8 add for AArch64 through Arm Compute Library (ACL) directly. Relative performance improvement using OMP_NUM_THREADS=1 is ~15x, using OMP_NUM_THREADS=32 it’s ~5.4x. Co-authored-by: David Svantesson <david.svantesson-yeung@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/148653 Approved by: https://github.com/malfet ghstack dependencies: #148585	2025-03-18 05:38:39 +00:00
Nikita Shulga	2e0c98ff05	[MPS] Add `bicubic2d_aa` (#149378 ) Which is currently the most frequently requested op in https://github.com/pytorch/pytorch/issues/141287 Mostly done by refactoring `upsample_bilinear2d_aa` to accept Functor as one of the template arguments, which closely ideas from `eec43cfbc0/src/libImaging/Resample.c` as well as `bb42e4d137/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu (L472-L478)` Populate unit tests by copying upsample_bilinear_2d_aa and reusing it as upsample_bicubic2d_aa At that point, only difference between upsample_bilinear2d_aa and upsample_bicubic2d_aa are convolution kernel function and size: for bilinear it's 3x3, for bicubic it's 5x5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149378 Approved by: https://github.com/dcci	2025-03-18 05:35:41 +00:00
Tristan Rice	dea7157160	nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort (#149351 ) Fixes #149153 Yaml generated from: ``` python .github/scripts/generate_ci_workflows.py ``` Test plan: Repro in https://gist.github.com/d4l3k/16a19b475952bc40ddd7f2febcc297b7 ``` rm -rf third_party/nccl python setup.py develop ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149351 Approved by: https://github.com/kwen2501, https://github.com/atalman, https://github.com/malfet	2025-03-18 05:23:18 +00:00
Rachel Guo	b8f91bcb14	[pt2_provenance_tracking] add support for cpp kernel (#149185 ) Summary: As title. Add inductor cpp kernel to post grad graph node mapping & UT. Context: Raised as a feature request for AOTI CPU case. https://fb.workplace.com/groups/1028545332188949/permalink/1169020841474730/ Differential Revision: D71181284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149185 Approved by: https://github.com/jingsh	2025-03-18 04:43:07 +00:00
Shangdi Yu	7869196482	Fix torchbind schema str generation (#149239 ) Summary: Fix Torchbind HOP schema generation when there's no input Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r schema ``` Differential Revision: D71231164 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149239 Approved by: https://github.com/zou3519	2025-03-18 04:29:56 +00:00
Wei-Sheng Chin	bca75fe97a	[MAIA] [Autocast] Enable autocast on MAIA device (#148511 ) Fixes #148510. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148511 Approved by: https://github.com/albanD	2025-03-18 03:46:22 +00:00
Davide Italiano	c43e35d6f7	[MPS] Implement support for `modified_bessel_i1` in eager. (#149368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149368 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-03-18 03:29:10 +00:00
Mu-Chu Lee	bb42e4d137	[AOTInductor] Add function to free buffer (#149161 ) Summary: We add a function that allows users to free the unused buffer. Test Plan: Testing correctness: python test/inductor/test_aot_inductor.py -k free_inactive Testing memory consumption: LD_LIBRARY_PATH=/data/users/$USER/pytorch/build/lib /home/$USER/local/pytorch/build/bin/test_aoti_inference Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149161 Approved by: https://github.com/chenyang78, https://github.com/desertfire ghstack dependencies: #149249	2025-03-18 02:43:14 +00:00
Jane Xu	cccdf860e2	[BE] Add STABLE_LIBRARY test for multiple returns (#149230 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149230 Approved by: https://github.com/albanD, https://github.com/zou3519 ghstack dependencies: #149052	2025-03-18 02:40:54 +00:00
Jane Xu	988827cdfb	Use schema as source of truth + support ones_like/empty_like (#149052 ) This change does 2 important things: (a) Instead of relying on IValue type as source of truth, we use the schema as the source of truth, which is important as IValue types are overloaded and can ambiguously convert incorrectly. For example, a MemoryFormat will look like an int + get converted to an int64_t vs a MemoryFormat! (b) This PR expands support for many more types to encompass way more schemas, e.g., Optional, Device, dtype, etc. The main win from this PR is the ability for aoti_torch_call_dispatcher to call TensorFactory ops like ones_like/empty_like! Pull Request resolved: https://github.com/pytorch/pytorch/pull/149052 Approved by: https://github.com/albanD	2025-03-18 02:40:54 +00:00
Justin Chu	ebabd0efdd	[ONNX] Expose verification utilities (#148603 ) Expose verification utilities to public documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148603 Approved by: https://github.com/titaiwangms	2025-03-18 02:10:34 +00:00
Sun, Jiayi	c36ac16da1	[Inductor] optimize welford reduction (#145061 ) Fix https://github.com/pytorch/pytorch/issues/141541. Fix https://github.com/pytorch/pytorch/issues/142839. Fix https://github.com/pytorch/pytorch/issues/143182. Summary: In order to fix the issue that the accuracy of welford reduction is not good enough, we refer to the eager implementation, combine Welford algorithm with cascade sum to improve numerical stability. Specifically: 1. Use Welford algorithm to compute mean and variance. 2. Use cascade summation when computing sum over input for both mean and variance. I tested Inductor benchmark with this PR on CPU, no performance gains or regressions were seen. Example: Take https://github.com/pytorch/pytorch/issues/141541 as an example: ``` import torch import torch.nn as nn torch.manual_seed(0) class Model(nn.Module): def __init__(self): super().__init__() self.gn = nn.GroupNorm(num_groups=32, num_channels=32) def forward(self, x): return self.gn(x) model = Model().eval() c_model = torch.compile(model) x = torch.randn(1, 32, 128, 128, 128) with torch.no_grad(): output = model(x) c_output = c_model(x) print(torch.max(torch.abs(output - c_output))) print(torch.allclose(output, c_output, 1.3e-6, 1e-5)) ``` logs - before ``` tensor(7.0095e-05) False ``` - After ``` tensor(9.5367e-07) True ``` - on CUDA ``` tensor(1.4305e-06, device='cuda:0', grad_fn=<MaxBackward1>) True ``` Generated code: - before ``` cpp_fused_native_group_norm_0 = async_compile.cpp_pybinding(['const float', 'const float', 'const float', 'float', 'float', 'float'], ''' #include "/tmp/torchinductor_jiayisun/pi/cpicxudqmdsjh5cm4klbtbrvy2cxwr7whxl3md2zzdjdf3orvfdf.h" extern "C" void kernel(const float* in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0, float* out_ptr1, float* out_ptr2) { { #pragma GCC ivdep for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(32L); x0+=static_cast<int64_t>(1L)) { { Welford<float> tmp_acc0 = Welford<float>(); Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); Welford<at::vec::Vectorized<float>> masked_tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); static WeightRecp<at::vec::Vectorized<float>> wrecps0(static_cast<int64_t>(131072L)); for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(2097152L); x1+=static_cast<int64_t>(16L)) { { if(C10_LIKELY(x1 >= static_cast<int64_t>(0) && x1 < static_cast<int64_t>(2097152L))) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<int64_t>(x1 + 2097152Lx0), static_cast<int64_t>(16)); tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp0, &wrecps0); } } } tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(masked_tmp_acc0_vec)); tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec)); out_ptr0[static_cast<int64_t>(x0)] = static_cast<float>(tmp_acc0.mean); out_ptr1[static_cast<int64_t>(x0)] = static_cast<float>(tmp_acc0.m2); } } } { #pragma GCC ivdep for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(32L); x0+=static_cast<int64_t>(1L)) { for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(2097152L); x1+=static_cast<int64_t>(16L)) { { if(C10_LIKELY(x1 >= static_cast<int64_t>(0) && x1 < static_cast<int64_t>(2097152L))) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<int64_t>(x1 + 2097152Lx0), static_cast<int64_t>(16)); auto tmp1 = out_ptr0[static_cast<int64_t>(x0)]; auto tmp4 = out_ptr1[static_cast<int64_t>(x0)]; auto tmp12 = in_ptr1[static_cast<int64_t>(x0)]; auto tmp15 = in_ptr2[static_cast<int64_t>(x0)]; auto tmp2 = at::vec::Vectorized<float>(tmp1); auto tmp3 = tmp0 - tmp2; auto tmp5 = static_cast<float>(2097152.0); auto tmp6 = tmp4 / tmp5; auto tmp7 = static_cast<float>(1e-05); auto tmp8 = decltype(tmp6)(tmp6 + tmp7); auto tmp9 = 1 / std::sqrt(tmp8); auto tmp10 = at::vec::Vectorized<float>(tmp9); auto tmp11 = tmp3 * tmp10; auto tmp13 = at::vec::Vectorized<float>(tmp12); auto tmp14 = tmp11 * tmp13; auto tmp16 = at::vec::Vectorized<float>(tmp15); auto tmp17 = tmp14 + tmp16; tmp17.store(out_ptr2 + static_cast<int64_t>(x1 + 2097152Lx0)); } } } } } } ''') ``` - After ``` cpp_fused_native_group_norm_0 = async_compile.cpp_pybinding(['const float', 'const float', 'const float', 'float', 'float', 'float'], ''' #include "/tmp/torchinductor_jiayisun/ln/clnlak27xpvmq3klpqyj6xzyq2thf4ecrezve5ddy4f4xaz4sb7w.h" extern "C" void kernel(const float in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0, float* out_ptr1, float* out_ptr2) { { #pragma GCC ivdep for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(32L); x0+=static_cast<int64_t>(1L)) { { Welford<float> tmp_acc0 = Welford<float>(); Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); Welford<at::vec::Vectorized<float>> masked_tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); WelfordHelper<at::vec::Vectorized<float>> welford_helper0(static_cast<int64_t>(131072L)); static WelfordHelper<at::vec::Vectorized<float>> masked_welford_helper0(static_cast<int64_t>(0L)); for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(2097152L); x1+=static_cast<int64_t>(16L)) { { if(C10_LIKELY(x1 >= static_cast<int64_t>(0) && x1 < static_cast<int64_t>(2097152L))) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<int64_t>(x1 + 2097152Lx0), static_cast<int64_t>(16)); tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp0, &welford_helper0); } } } tmp_acc0_vec = welford_combine(tmp_acc0_vec, &welford_helper0); masked_tmp_acc0_vec = welford_combine(masked_tmp_acc0_vec, &masked_welford_helper0); tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(masked_tmp_acc0_vec)); tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec)); out_ptr0[static_cast<int64_t>(x0)] = static_cast<float>(tmp_acc0.mean); out_ptr1[static_cast<int64_t>(x0)] = static_cast<float>(tmp_acc0.m2); } } } { #pragma GCC ivdep for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(32L); x0+=static_cast<int64_t>(1L)) { for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(2097152L); x1+=static_cast<int64_t>(16L)) { { if(C10_LIKELY(x1 >= static_cast<int64_t>(0) && x1 < static_cast<int64_t>(2097152L))) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<int64_t>(x1 + 2097152Lx0), static_cast<int64_t>(16)); auto tmp1 = out_ptr0[static_cast<int64_t>(x0)]; auto tmp4 = out_ptr1[static_cast<int64_t>(x0)]; auto tmp12 = in_ptr1[static_cast<int64_t>(x0)]; auto tmp15 = in_ptr2[static_cast<int64_t>(x0)]; auto tmp2 = at::vec::Vectorized<float>(tmp1); auto tmp3 = tmp0 - tmp2; auto tmp5 = static_cast<float>(2097152.0); auto tmp6 = tmp4 / tmp5; auto tmp7 = static_cast<float>(1e-05); auto tmp8 = decltype(tmp6)(tmp6 + tmp7); auto tmp9 = 1 / std::sqrt(tmp8); auto tmp10 = at::vec::Vectorized<float>(tmp9); auto tmp11 = tmp3 * tmp10; auto tmp13 = at::vec::Vectorized<float>(tmp12); auto tmp14 = tmp11 * tmp13; auto tmp16 = at::vec::Vectorized<float>(tmp15); auto tmp17 = tmp14 + tmp16; tmp17.store(out_ptr2 + static_cast<int64_t>(x1 + 2097152L*x0)); } } } } } } ''') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145061 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jansel	2025-03-18 02:05:35 +00:00
cyy	1096443467	Use torch_compile_options for c10 libraries (#147821 ) c10, c10_cuda, c10_hip and c10_xpu are given additional compile options by torch_compile_options, which are more restrictive and can help reveal potential bugs inside the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147821 Approved by: https://github.com/guangyey, https://github.com/malfet	2025-03-18 01:54:23 +00:00
Su, Tong	60523540f1	Force build to conform C++ standard on windows by adding /permissive- flag (#149035 ) Fixes #147366 1. Add `/permissive-` to the `torch_compile_options` for the build to conform to the C++ standard. 2. Fix the error when trying to assign a string literal to a non-const ptr. The `/permissive-` flag can be found at https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 From the above [doc](https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170#remarks), > By default, the /permissive- option is set in new projects created by Visual Studio 2017 version 15.5 and later versions. > The /permissive- option is implicitly set by the /std:c++latest option starting in Visual Studio 2019 version 16.8, and in version 16.11 by the /std:c++20 option. Thus, it is reasonable to add this flag to the existing project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149035 Approved by: https://github.com/guangyey, https://github.com/malfet	2025-03-18 01:51:46 +00:00
Xia, Weiwen	c1dd75e4dc	Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031 ) Summary Previous implementation of shim did not align with the design and it was removed by https://github.com/pytorch/pytorch/pull/148907 This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT. Test plan ``` pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149031 Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire	2025-03-18 01:33:13 +00:00
cyy	425c6d8eba	Replace c10::is_pod with std::is_trivial (#149286 ) These remaining c10::is_pod calls can be replaced without compromising the semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149286 Approved by: https://github.com/zou3519	2025-03-18 01:33:01 +00:00
Animesh Jain	f9a787224c	[dynamo][guards][serialization] Dont use ID_MATCH guard for bool and None (#149228 ) Doing this removes the need of collecting `id` and therefore facilitates serialization. It also improves readability with recompilations. Earlier, recompile message will just show the `id`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149228 Approved by: https://github.com/jansel	2025-03-18 01:25:37 +00:00
Davide Italiano	186cc7327c	[MPS/BE] Remove decorator that skipped test on macOS 12. (#149365 ) macOS 12 is not really supported anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149365 Approved by: https://github.com/malfet	2025-03-18 00:58:08 +00:00
Aaron Gokaslan	a0ac63cbd9	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/149257 Approved by: https://github.com/jansel	2025-03-18 00:46:07 +00:00
Davide Italiano	811f587d86	[MPS/BE] @parametrize generation of pointwise_ops. (#149363 ) Make this less error prone/reduces duplication. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149363 Approved by: https://github.com/malfet	2025-03-18 00:37:43 +00:00
Bin Bao	9a78513c3c	[AOTI] Update test runner to use the new APIs (#147105 ) Summary: Switch to the newer aoti_compile_and_package APIs. Some tests still kept using legacy APIs, and will follow up with internal test refactoring. Differential Revision: [D69609685](https://our.internmc.facebook.com/intern/diff/D69609685) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147105 Approved by: https://github.com/jingsh	2025-03-18 00:27:09 +00:00
PyTorch MergeBot	b52a8bef01	Revert "[dynamo][guards][serialization] Dont use ID_MATCH guard for bool and None (#149228 )" This reverts commit `5905bbe745`. Reverted https://github.com/pytorch/pytorch/pull/149228 on behalf of https://github.com/malfet due to I wonder if this will fix the pr-time-benchmark regressions ([comment](https://github.com/pytorch/pytorch/pull/149228#issuecomment-2731237949))	2025-03-18 00:10:50 +00:00
Nikita Shulga	46226a90c8	[EZ][BE] Remove cross-compilation options from mac-build.yml (#149237 ) It has long been gone Pull Request resolved: https://github.com/pytorch/pytorch/pull/149237 Approved by: https://github.com/seemethere, https://github.com/atalman	2025-03-17 23:50:31 +00:00
Eli Uriegas	523bffd388	cd: Add no-cache for test binaries (#149218 ) This is to make it so that we don't experience issues like https://github.com/pytorch/vision/actions/runs/13861462856/job/38795684317#step:13:212 ``` ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. unknown package: Expected sha256 8e34a6f02ac5a63763251953063a19ba9df855ac2c8a13ef409dfef708e2ba26 Got 341156cc5067488565c1e103be6e95105b0fc0d87d8ac24ff8891f63fd33216f ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149218 Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet	2025-03-17 23:26:20 +00:00
Mayank Mishra	37c914ca0c	fix simple-spec crash (#147723 ) found an issue while running `python torchgen/fuse/gen_patterns.py` exact error: ```shell Traceback (most recent call last): File "/Users/mayankmishra/Desktop/non-IBM/pytorch/torchgen/fuse/gen_patterns.py", line 19, in <module> joint_graph.lazy_init() File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 2096, in lazy_init result = fn() File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/fx_passes/joint_graph.py", line 53, in lazy_init _pad_mm_init() File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/fx_passes/pad_mm.py", line 905, in _pad_mm_init gen_register_replacement( File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 1584, in gen_register_replacement pat = _serialize_pattern( File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 1539, in _serialize_pattern file_template = get_file_template() File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 1513, in get_file_template if isinstance(attr, type) and issubclass(attr, (PatternExpr, _TargetExpr)): File "/Users/mayankmishra/miniconda3/envs/ai/lib/python3.10/abc.py", line 123, in __subclasscheck__ return _abc_subclasscheck(cls, subclass) TypeError: issubclass() arg 1 must be a class ``` This PR fixes this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147723 Approved by: https://github.com/aorenste Co-authored-by: Aaron Orenstein <aorenste@meta.com>	2025-03-17 23:25:48 +00:00
Tony-Y	78715a181f	Convert Tensor lr to 0-dim as needed for the optimizer to normally work (#145674 ) Fixes #145461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145674 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-03-17 23:07:05 +00:00
Mu-Chu Lee	1157367c78	[AOTInductor] [BE] Add macro for loading symbols in aoti runner (#149249 ) Summary: Add macro for loading symbols in aoti runner Test Plan: Existing tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149249 Approved by: https://github.com/chenyang78	2025-03-17 23:02:01 +00:00
PyTorch MergeBot	24cfeec2c7	Revert "[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 )" This reverts commit `bfee141666`. Reverted https://github.com/pytorch/pytorch/pull/149257 on behalf of https://github.com/malfet due to Let's see if it helps restore compiler benchmark sanity, see `8bc7bd94a5/1` ([comment](https://github.com/pytorch/pytorch/pull/149257#issuecomment-2731133812))	2025-03-17 22:57:00 +00:00
PyTorch MergeBot	afa1eda901	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )" This reverts commit `ef6296e7f2`. Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/izaitsevfb due to reverted internally, see D71292427 ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2731114626))	2025-03-17 22:43:15 +00:00
Yanan Cao (PyTorch)	a16ada41b9	Fix outdated docstring of torch.export.export regarding strict flag (#149077 ) Summary: Fix outdated docstring of torch.export.export regarding strict flag Test Plan: None, doc only change Differential Revision: D71068215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149077 Approved by: https://github.com/zhxchen17	2025-03-17 22:29:20 +00:00
Sheng Qin	d25617255c	Fix AOTI update_constant_buffer issue. (#149243 ) Summary: In D69553929 we changed the logic of constant & buffer update in AOTI. However this is incompatible with current Sigmoid runtime since we have different logics to pass in buffers, resulted in errors like ``` I0310 17:29:24.456960 3679102 AOTIDelegateExecutor.cpp:89] AOTIDelegateExecutor processing weights * Aborted at 1741652964 (Unix time, try 'date -d 1741652964') * * Signal 11 (SIGSEGV) (0x30) received by PID 3679102 (pthread TID 0x7f9933e49000) (linux TID 3679102) (code: address not mapped to object), stack trace: * @ 00000000000040b9 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t, void) ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:453 @ 0000000000006c45 folly::fibers::(anonymous namespace)::sigsegvSignalHandler(int, siginfo_t, void) ./fbcode/folly/fibers/GuardPageAllocator.cpp:237 @ 000000000004455f (unknown) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c @ 00000000001e8164 torch::aot_inductor::AOTInductorModelContainer::update_constant_buffer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, AtenTensorOpaque, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, AtenTensorOpaque> > > const&, bool, bool) ``` Test Plan: 1) Generate lowered merge net ``` CUDA_VISIBLE_DEVICES=0 ../buck-out/v2/gen/fbcode/b5b13003c82cbdec/caffe2/torch/fb/model_transform/fx2trt/packaging/__generate_merge_net_file__/generate_merge_net_file.par --action=generate --input-file=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_input --output-file=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_sigmoid --lower-backend=aot_inductor --use_sigmoid=true --aot_inductor_config="{'max_autotune': True, 'comprehensive_padding': False}" --add_passes=use_matmul_lce_replace_normal_LCE,use_triton_dot_compress,use_matmul_fuse_lce_replace_first_LCE,use_contiguous_linear_reduction_replace_linear_reduction --disable_acc_tracer=false ``` 2) Load net predictor ``` CUDA_VISIBLE_DEVICES=1 ../buck-out/v2/gen/fbcode/103717df3cc2b97a/caffe2/torch/fb/model_transform/fx2trt/packaging/__load_net_predictor__/load_net_predictor --loadMode=AccuracyAB --inputNetFile=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_ts --otherNetFile=/home/shengqin/models/aoti_sigmoid_test/cmf_interformer_with_custom_triton_kernels_691990503_0_output.aoti_sigmoid --moduleName=merge --benchmarkEnableProfiling=false —-predictor_hardware_type=1 --disableStaticRuntime=true ``` Reviewed By: hl475 Differential Revision: D71236710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149243 Approved by: https://github.com/hl475, https://github.com/jingsh	2025-03-17 22:10:57 +00:00
Isuru Fernando	a3c6e3139a	allow extra args for parameterization of tests in inductor (#149154 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149154 Approved by: https://github.com/amjames, https://github.com/eellison	2025-03-17 22:05:06 +00:00
Davide Italiano	e4f6e4ac84	[MPS] Add inductor support for `modified_bessel_i0`. (#149342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149342 Approved by: https://github.com/malfet	2025-03-17 21:45:51 +00:00
Carlo Bertolli	8bc7bd94a5	[ROCm] Input vectorization in elementwise kernels for tensors with heterogeneous types (#147527 ) This patch exemplifies its use for input tensors with types (float,bfloat16) when functor type is float(float,float). Pull Request resolved: https://github.com/pytorch/pytorch/pull/147527 Approved by: https://github.com/jeffdaily Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com>	2025-03-17 20:51:36 +00:00
Benjamin Glass	e8dd58b8cf	cpp_wrapper: Precompile device-specific header files (#146928 ) This saves us about a second per compilation, which is _massive_ for the OpInfo tests. Total OpInfo test runtime is down about 2x from this change alone. Relands #144002, with changes needed by fbcode internals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146928 Approved by: https://github.com/desertfire	2025-03-17 20:40:15 +00:00
Sampsa	5e9f792479	[ROCm] Unskip flex attention UTs after triton 3.3 bump (#148327 ) Enable `test_flex_attention.py::TestLearnableBiases` unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148327 Approved by: https://github.com/jeffdaily	2025-03-17 20:15:14 +00:00
Shunting Zhang	6c7d8419e3	fix two accuracy regression (#149172 ) There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check. - error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316 - error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172 Approved by: https://github.com/jansel, https://github.com/eellison	2025-03-17 19:34:00 +00:00

1 2 3 4 5 ...

85569 Commits