pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
fduwjj	5b4c864672	[c10d] Enable CudaEventCache by default and add multi device support (#140975 ) We added `CudaEventCache` in https://github.com/pytorch/pytorch/pull/133727 and this is a feature which tries to reuse CudaEvent so that we don't call destroy of CudaEvent which causes hang in the past. We had a bunch of tests and testing on TorchTitan and internal workload already. So far no errors or crash are found at the moment so we decide to roll out to all OSS users. For internal workload, this PR would not affect it because of some internal gating. Also we observed some multi-device use cases in OSS, so that we want to bring back multi-device support originally proposed in https://github.com/pytorch/pytorch/pull/122732/files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140975 Approved by: https://github.com/eqy, https://github.com/kwen2501	2024-11-26 18:42:45 +00:00
cyy	263d8f7a94	[8/N] Don't skip ASAN on some tests (#140081 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140081 Approved by: https://github.com/ezyang	2024-11-09 01:00:13 +00:00
Bin Bao	1868fc63d8	[AOTI] Update C++ runner API to take a const vector (#139955 ) Summary: Tighten the AOTIModelContainerRunner::run interface to take a const vector of at::Tensor, which 1) makes it clear that the runner will not modify the input tensor vector; 2) runner will be able to take a temp vector of tensors as the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139955 Approved by: https://github.com/chenyang78	2024-11-08 16:59:10 +00:00
Ke Wen	e474f0de82	[PGNCCL] Slimming watchdog loop (#139834 ) - Refactored traceback code into `work.printTraceback()`. cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @shuqiangzhang - Refactored desync debug code into `class DesyncDebugger`. - Moved occurrences of `futureWorkResult_->markCompleted` into `checkAndSetException` and `checkTimeout`, respectively. cc @shuqiangzhang - Modularized dump signal broadcast code into `ProcessGroupNCCL::broadcastDumpSignal`. cc @fduwjj @c-p-i-o Pull Request resolved: https://github.com/pytorch/pytorch/pull/139834 Approved by: https://github.com/shuqiangzhang	2024-11-07 17:22:44 +00:00
PyTorch MergeBot	7e02386303	Revert "[2/N] Replace c10::sv with std::sv (#139456 )" This reverts commit `028c5d3426`. Reverted https://github.com/pytorch/pytorch/pull/139456 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. @ezyang can you please help get this landed? See D65546398 for more details ([comment](https://github.com/pytorch/pytorch/pull/139456#issuecomment-2462768891))	2024-11-07 17:00:59 +00:00
cyy	028c5d3426	[2/N] Replace c10::sv with std::sv (#139456 ) Follows #139453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139456 Approved by: https://github.com/ezyang	2024-11-06 01:50:38 +00:00
cyyever	46d0b635b9	[CMake] Remove pthread linking (#134436 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134436 Approved by: https://github.com/r-barnes	2024-10-29 23:14:40 +00:00
Richard Barnes	068f7e7a78	torch::optional -> std::optional (#138987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138987 Approved by: https://github.com/Skylion007	2024-10-28 19:09:46 +00:00
Richard Barnes	42994234a6	std::value/std::type -> std::_v/std::_t (#138746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-26 20:59:24 +00:00
Mwiza Kunda	22d2e2d9a0	Set RUNPATH so installed tests can find the required shared libraries (#136627 ) This change fixes the RUNPATH of installed c++ tests so that the linker can find the shared libraries they depend on. For example, currently: ```bash venv/lib/python3.10/site-packages/torch $ ./bin/test_lazy ./bin/test_lazy: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136627 Approved by: https://github.com/malfet	2024-10-25 09:38:08 +00:00
Angela Yi	51f6b946ae	[torchbind] Add generic __deepcopy__ method (#137613 ) Summary: Added a generic `__deepcopy__` method which will use the torchbind object's existing `__getattr__` and `__setattr__` to copy the torchbind object. This will later be used in [D64124825](https://www.internalfb.com/diff/D64124825) Differential Revision: D64124826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137613 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-10-24 22:14:55 +00:00
FFFrog	af0bc75460	Remove deprecated alias macro(1/3) (#137556 ) Detailed Descriptions: - Remove AT_ERROR Macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/137556 Approved by: https://github.com/ezyang	2024-10-21 17:32:32 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Edward Yang	b14269dcfb	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) (#138155 ) Summary: - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Original pull request: https://github.com/pytorch/pytorch/pull/136519 Test Plan: contbuild & OSS CI, see `4a8e49389c` Reviewed By: malfet Differential Revision: D64471142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155 Approved by: https://github.com/malfet, https://github.com/bobrenjc93	2024-10-17 20:58:56 +00:00
Shivam Raikundalia	dfb5ac05cc	[Record Function] Add Kwargs only USER_SCOPE Macro (#138020 ) Summary: Add a macro such that users can easily add a USER annotation with kwargs only Test Plan: Will use D63801503 to test this E2E. Added unit test as well that makes sure that the kwargs get recorded correctly Differential Revision: D64420328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138020 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi	2024-10-17 18:48:49 +00:00
fduwjj	7e704c2073	[c10d] Add unit test for CUDAEventCache to ensure caching is working (#138059 ) We created a simple test to validate the cache is indeed working and when the cache is indeed used up. I revert the fix in (https://github.com/pytorch/pytorch/pull/138040) and the test indeed failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138059 Approved by: https://github.com/kwen2501 ghstack dependencies: #138040, #138048	2024-10-16 17:34:57 +00:00
Shuqiang Zhang	f4158558aa	[c10d] disable watchdog thread in blockingWait mode (#138001 ) Summary: Blocking wait mode is not widely used, probably useful in debugging. in blockingWait mode, we don't need to enable the watchdog thread to check the timeout or nccl error because the main thread would throw an exception if error happens and it is obvious to user which work fails and its user's responsibility to handle the exception. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/138001 Approved by: https://github.com/fduwjj, https://github.com/c-p-i-o ghstack dependencies: #137799	2024-10-16 07:42:22 +00:00
PyTorch MergeBot	d4d687ffb2	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `4a8e49389c`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:16 +00:00
Richard Barnes	b7f798caa4	Use C10_UNUSED instead of (void)X (#137239 ) Summary: Auto-generated with ``` buck run //scripts/rbarnes/regex_multiline_replacer:regex_multiline_replacer -- --find '^(\sfor\s$)(const.\n)\s\(void$[A-Za-z]+;\s//\sSuppress.\s\n(.)' --replace '\1C10_UNUSED \2\3' `find caffe2/ -regex ".\.$cpp\\|h$"` ``` Differential Revision: D33432600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137239 Approved by: https://github.com/Skylion007	2024-10-15 14:32:59 +00:00
Catherine Lee	8ac06467d4	Forward fix test (#137910 ) Summary: Add back in a deleted file to fix test It was removed in https://github.com/pytorch/pytorch/pull/137404 Test Plan: `buck2 build --flagfile fbcode//mode/opt fbcode//caffe2/test/cpp/c10d:ProcessGroupGlooAsyncTest` succeeded Differential Revision: D64341074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137910 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/kit1980	2024-10-14 22:07:29 +00:00
FFFrog	4a8e49389c	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) ---- - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-13 12:38:02 +00:00
Nichols A. Romero	bd63ec4f45	[ROCm] LoadHIP CMake cleanup (#137112 ) Should help mitigate issues reported here: https://github.com/pytorch/pytorch/issues/128313 While working on https://github.com/pytorch/pytorch/pull/136700, we realized that some of the ROCm CMake can be streamlined. This PR does not fix any bugs or provide any new functionality. Strictly clean-up. The remaining `${ROCM_ROCTX_LIB}` will be removed when we transition to the rocprofiler-sdk (to be done in a separate PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/137112 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily	2024-10-13 00:06:41 +00:00
PyTorch MergeBot	079f909263	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit `be0b75256a`. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))	2024-10-10 18:32:17 +00:00
cyy	94e12f97dc	[Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404 ) Follows #137072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137404 Approved by: https://github.com/Skylion007	2024-10-10 18:05:34 +00:00
sanshang	249152475d	fix sequence number for group (#134578 ) Summary: Fix sequence number in execution trace dump for matching between collective/p2p op and wait in execution trace replay. `ProcessGroupNCCL` has 2 sequence number counter, `seqCollective_` and `seqP2P_`. `b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L1188-L1191)` However, `WorkNCCL` only has one sequence number member `seq_`. `b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L387)` We need to match collective and p2p with wait separately. `29b5a462dc` Depend on: https://github.com/pytorch/pytorch/pull/135132 Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test Differential Revision: Pull Request resolved: https://github.com/pytorch/pytorch/pull/134578 Approved by: https://github.com/kwen2501, https://github.com/c-p-i-o	2024-10-10 04:24:06 +00:00
Shuqiang Zhang	47a515d260	[c10d] simplify barrier implementation and further decouple CPU/GPU (#137516 ) synchronization Summary: Barrier is essentially intended to block CPU thread (instead of GPU streams). Before we used 2 stream synchronizations (1. current stream blocked by nccl stream end event, 2. CPU thread blocked on current stream). This is unnecessary as we already have CPU thread blocking logic in wait(). Also, adding barrier specific code block in the general GPU synchronize() API is intrusive and confusing. This PR cleans this. Test Plan: CI Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/137516 Approved by: https://github.com/fduwjj, https://github.com/kwen2501	2024-10-09 23:55:28 +00:00
FFFrog	be0b75256a	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-09 02:13:36 +00:00
cyy	6327a71880	[Environment Variable][2/N] Use thread-safe setenv wrapper (#124485 ) This follows #119449 to make setenv thread-safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124485 Approved by: https://github.com/eqy	2024-10-04 07:30:51 +00:00
abhishek-fujitsu	63d6908da0	fix build error with gcc 12+ (#137092 ) Fixes #127920 This commit addresses a build failure occurring with GCC 12 and above due to the -Werror=nonnull flag. The error manifests in the test_api target. Issue: When building with GCC 12+, the following error occurs: ``` error: argument 1 null where non-null expected [-Werror=nonnull] 431 \| __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This change ensures that: 1. The flag is only added for GCC 12 or higher 2. The flag is only added if it's supported by the compiler 3. The flag is added specifically to the test_api target, not globally By disabling this specific error, we allow the build to proceed while maintaining other compiler warnings. Test Plan: - Verified successful build with GCC 12 and above - Ensured no regression in builds with earlier GCC versions and other compilers Pull Request resolved: https://github.com/pytorch/pytorch/pull/137092 Approved by: https://github.com/malfet	2024-10-02 00:37:15 +00:00
fduwjj	911a43f930	[TCPStore] Remove deprecated constructor (#136004 ) While looking at TCPStore code again and found it confusing that we still keep the deprecated constructor for TCPStore in cpp while we don't expose it in python via pybind already. I checked both internal and external, all use cases in cpp (aside from unit test fixed in this PR) already moved to using option. So let's remove this legacy constructor to avoid confusion. Differential Revision: [D62653634](https://our.internmc.facebook.com/intern/diff/D62653634) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136004 Approved by: https://github.com/Skylion007, https://github.com/XilunWu	2024-09-14 04:25:47 +00:00
PyTorch MergeBot	564d00f364	Revert "Fix clang-tidy warnings in Caffe2 code (#134935 )" This reverts commit `7cfd23636c`. Reverted https://github.com/pytorch/pytorch/pull/134935 on behalf of https://github.com/izaitsevfb due to breaks internal builds, caffe2 is still used internally ([comment](https://github.com/pytorch/pytorch/pull/134935#issuecomment-2349368152))	2024-09-13 16:42:37 +00:00
cyy	7cfd23636c	Fix clang-tidy warnings in Caffe2 code (#134935 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134935 Approved by: https://github.com/ezyang	2024-09-12 03:27:09 +00:00
PyTorch MergeBot	c044deb9ce	Revert "c10d/logging: add C10D_LOCK_GUARD (#134131 )" This reverts commit `f33bcbe5fd`. Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/kit1980 due to See D61985186 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2327556381))	2024-09-03 22:35:14 +00:00
Bin Bao	310eb6d8c6	[AOTI] Fix test_aoti_inference CPU build issue (#134675 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/130311. We need to guard CUDA-only code in test_aoti_inference with macros so that it won't fail for CPU-only platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134675 Approved by: https://github.com/atalman, https://github.com/chunyuan-w	2024-08-28 17:42:19 +00:00
Tristan Rice	f33bcbe5fd	c10d/logging: add C10D_LOCK_GUARD (#134131 ) This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s. This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things. This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate. Test plan: existing CI for regressions will add unit tests on `C10D_LOCK_GUARD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj	2024-08-28 01:40:42 +00:00
PyTorch MergeBot	1c4780e69a	Revert "c10d/logging: add C10D_LOCK_GUARD (#134131 )" This reverts commit `4c28a0eb0b`. Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/ZainRizvi due to Sorry but this causes formatting errors internally which make it fail to build. See D61759282 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2310455878))	2024-08-26 15:19:27 +00:00
Sheng Fu	519342962d	Pass process group info into NcclWork (#134269 ) Summary: Pass process group info into NcclWork Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test Differential Revision: D61677160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134269 Approved by: https://github.com/wconstab	2024-08-24 01:04:43 +00:00
Tristan Rice	4c28a0eb0b	c10d/logging: add C10D_LOCK_GUARD (#134131 ) This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s. This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things. This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate. Test plan: existing CI for regressions will add unit tests on `C10D_LOCK_GUARD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131 Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj	2024-08-24 00:27:39 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
cyy	c2eeda5da0	[structural binding][12/N] Replace std::tie with structural binding (#131031 ) Follows #130830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131031 Approved by: https://github.com/ezyang	2024-08-14 00:51:34 +00:00
Chirag Pandya	40767e8468	[BE] rename testHelperPrefix test (#132916 ) Summary: Re-enable testHelperPrefix test that was erroneously disabled in CI. Fixes #50701 Test Plan: Test passes locally: ``` ❯ ./TCPStoreTest --gtest_filter=TCPStoreTest.testHelperPrefix Running main() from /data/users/cpio/pytorch/third_party/googletest/googletest/src/gtest_main.cc Note: Google Test filter = TCPStoreTest.testHelperPrefix [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from TCPStoreTest [ RUN ] TCPStoreTest.testHelperPrefix [W807 12:01:31.531576727 socket.cpp:462] [c10d] waitForInput: poll for socket SocketImpl(fd=6, addr=[localhost]:37984, remote=[localhost]:37171) returned 0, likely a timeout [W807 12:01:31.531663710 socket.cpp:487] [c10d] waitForInput: socket SocketImpl(fd=6, addr=[localhost]:37984, remote=[localhost]:37171) timed out after 100ms [ OK ] TCPStoreTest.testHelperPrefix (314 ms) [----------] 1 test from TCPStoreTest (314 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (314 ms total) [ PASSED ] 1 test. ╭─ ~/local/pytorch/build/bin main *1 +1 ···················· ✔ /home/cpio/local/a/pytorch-env  cpio@devgpu011 ─╮ ╰─ ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/132916 Approved by: https://github.com/Skylion007	2024-08-08 20:54:52 +00:00
Prachi Gupta	c326533999	[ROCm][Inductor] Enable AOT Inductor CPP UTs for ROCm (#131521 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/131521 Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/malfet	2024-08-08 19:49:56 +00:00
Shangdi Yu	21906ddaba	[AOTI] Fix complex64 not defined (#132810 ) Partially fixes #122980 - change cpp type mapping for complex64 to std::complex<float> - add `aoti_torch_item_complex64` and `aoti_torch_scalar_to_tensor_complex64`. - add `expensiveCopyToTensor()` to convert `ArrayRefTensor<T>` type to `AtenTensorHandle` type. - if we want to fully fix #122980, we still need to let ArrayRef and MiniArrayRef to consider underlying storage number of elements. See more details in https://github.com/pytorch/pytorch/pull/132347 (#132347 broke some internal tests, so we need more work before landing it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/132810 Approved by: https://github.com/desertfire	2024-08-08 18:08:23 +00:00
Matthew Hoffman	258f47fc0b	Add `padding_side` to `pad_sequence` with `"left"` and `"right"` options (`"right"` as default) (#131884 ) Fixes #10536 Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work! As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral. The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](`9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)`), which requires only a very small change in the C++ code. Here are the benchmarks of the new implementation! `float32`: ![eaaa95ef-9384-45d2-be56-6898bc1d3514](https://github.com/user-attachments/assets/3b0eb309-e5a0-4a4d-97bb-4e3298783dbb) `bool`: ![892f32da-8d9a-492b-9507-18d3f0a41e8e](https://github.com/user-attachments/assets/6824ea15-7d4e-4b89-95f0-8546635f0c2e) Code: ```python from __future__ import annotations import random import time from typing import Literal import numpy as np import torch def pad_sequence_with_flips( sequences: list[torch.Tensor], batch_first: bool = False, padding_value: int \| float \| bool = 0.0, padding_side: Literal["left", "right"] \| str = "left", ) -> torch.Tensor: if padding_side == 'right': padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value) elif padding_side=='left': padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value) # pyright: ignore[reportArgumentType] padded_sequence = padded_sequence.flip(int(batch_first)) else: raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}") return padded_sequence sequence_lengths: list[int] = [] flip_left_pad_times: list[float] = [] flip_left_pad_times_std: list[float] = [] left_pad_times: list[float] = [] left_pad_times_std: list[float] = [] RUNS_PER_LOOP: int = 100 for i in range(1, 7): sequence_length = i * int(1e6) // 6 sequence_lengths.append(sequence_length) sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)] inner_left_pad_times: list[float] = [] inner_right_pad_times: list[float] = [] inner_flip_left_pad_times: list[float] = [] inner_flip_right_pad_times: list[float] = [] for _ in range(RUNS_PER_LOOP): start = time.perf_counter() torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left") end = time.perf_counter() inner_left_pad_times.append(end - start) start = time.perf_counter() pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left") end = time.perf_counter() inner_flip_left_pad_times.append(end - start) left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times)) left_pad_times_std.append(np.std(inner_left_pad_times)) flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times)) flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times)) print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}") import matplotlib.pyplot as plt plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left") plt.scatter(sequence_lengths, left_pad_times) plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^') plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)") plt.scatter(sequence_lengths, flip_left_pad_times) plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^') plt.xlabel("Sequence Length") plt.ylabel("Time (s)") plt.legend(loc="upper right") # Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794 # Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562 # Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863 # Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877 # Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024 # Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751 ``` Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884 Approved by: https://github.com/ezyang	2024-08-07 15:53:07 +00:00
Ivan Zaitsev	841cadd555	Fix discrepancies from 129973 (#132545 ) #129973 ([D59132793](https://www.internalfb.com/diff/D59132793)) was exported missing changes in `test/cpp/jit/CMakeLists.txt` this PR remediates that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132545 Approved by: https://github.com/kit1980	2024-08-03 01:57:49 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Xuehai Pan	548c460bf1	[BE][Easy][7/19] enforce style for empty lines in import segments in `test/[a-c]/` and `test/[q-z]/` (#129758 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758 Approved by: https://github.com/ezyang	2024-07-31 10:54:03 +00:00
cyy	89da94594e	[11/N] Fix clang-tidy warnings in jit (#132131 ) Follows #132122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132131 Approved by: https://github.com/Skylion007	2024-07-31 03:45:52 +00:00
cyy	73d0f484b3	[structural binding][11/N] Replace std::tie with structural binding (#130830 ) Follows #130784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130830 Approved by: https://github.com/janeyx99	2024-07-18 00:45:06 +00:00
cyy	168e41009b	[structural binding][10/N] Replace std::tie with structural binding (#130784 ) Follows #130404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130784 Approved by: https://github.com/malfet	2024-07-16 10:28:14 +00:00

1 2 3 4 5 ...

2298 Commits