Commit Graph

2298 Commits

Author SHA1 Message Date
fduwjj
5b4c864672 [c10d] Enable CudaEventCache by default and add multi device support (#140975)
We added `CudaEventCache` in https://github.com/pytorch/pytorch/pull/133727 and this is a feature which tries to reuse CudaEvent so that we don't call destroy of CudaEvent which causes hang in the past. We had a bunch of tests and testing on TorchTitan and internal workload already. So far no errors or crash are found at the moment so we decide to roll out to all OSS users. For internal workload, this PR would not affect it because of some internal gating.

Also we observed some multi-device use cases in OSS, so that we want to bring back multi-device support originally proposed in https://github.com/pytorch/pytorch/pull/122732/files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140975
Approved by: https://github.com/eqy, https://github.com/kwen2501
2024-11-26 18:42:45 +00:00
cyy
263d8f7a94 [8/N] Don't skip ASAN on some tests (#140081)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140081
Approved by: https://github.com/ezyang
2024-11-09 01:00:13 +00:00
Bin Bao
1868fc63d8 [AOTI] Update C++ runner API to take a const vector (#139955)
Summary: Tighten the AOTIModelContainerRunner::run interface to take a const vector of at::Tensor, which 1) makes it clear that the runner will not modify the input tensor vector; 2) runner will be able to take a temp vector of tensors as the input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139955
Approved by: https://github.com/chenyang78
2024-11-08 16:59:10 +00:00
Ke Wen
e474f0de82 [PGNCCL] Slimming watchdog loop (#139834)
- Refactored traceback code into `work.printTraceback()`.  cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @shuqiangzhang
- Refactored desync debug code into `class DesyncDebugger`.
- Moved occurrences of `futureWorkResult_->markCompleted` into `checkAndSetException` and `checkTimeout`, respectively. cc @shuqiangzhang
- Modularized dump signal broadcast code into `ProcessGroupNCCL::broadcastDumpSignal`. cc @fduwjj @c-p-i-o

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139834
Approved by: https://github.com/shuqiangzhang
2024-11-07 17:22:44 +00:00
PyTorch MergeBot
7e02386303 Revert "[2/N] Replace c10::sv with std::sv (#139456)"
This reverts commit 028c5d3426.

Reverted https://github.com/pytorch/pytorch/pull/139456 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. @ezyang can you please help get this landed? See D65546398 for more details ([comment](https://github.com/pytorch/pytorch/pull/139456#issuecomment-2462768891))
2024-11-07 17:00:59 +00:00
cyy
028c5d3426 [2/N] Replace c10::sv with std::sv (#139456)
Follows  #139453

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139456
Approved by: https://github.com/ezyang
2024-11-06 01:50:38 +00:00
cyyever
46d0b635b9 [CMake] Remove pthread linking (#134436)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134436
Approved by: https://github.com/r-barnes
2024-10-29 23:14:40 +00:00
Richard Barnes
068f7e7a78 torch::optional -> std::optional (#138987)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138987
Approved by: https://github.com/Skylion007
2024-10-28 19:09:46 +00:00
Richard Barnes
42994234a6 std::value/std::type -> std::_v/std::_t (#138746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746
Approved by: https://github.com/cyyever, https://github.com/malfet
2024-10-26 20:59:24 +00:00
Mwiza Kunda
22d2e2d9a0 Set RUNPATH so installed tests can find the required shared libraries (#136627)
This change fixes the RUNPATH of installed c++ tests so that the linker can find the shared libraries they depend on.

For example, currently:
```bash
venv/lib/python3.10/site-packages/torch $ ./bin/test_lazy
./bin/test_lazy: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136627
Approved by: https://github.com/malfet
2024-10-25 09:38:08 +00:00
Angela Yi
51f6b946ae [torchbind] Add generic __deepcopy__ method (#137613)
Summary: Added a generic `__deepcopy__` method which will use the torchbind object's existing `__getattr__` and `__setattr__` to copy the torchbind object. This will later be used in [D64124825](https://www.internalfb.com/diff/D64124825)

Differential Revision: D64124826

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137613
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2024-10-24 22:14:55 +00:00
FFFrog
af0bc75460 Remove deprecated alias macro(1/3) (#137556)
**Detailed Descriptions:**
- Remove AT_ERROR Macro

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137556
Approved by: https://github.com/ezyang
2024-10-21 17:32:32 +00:00
Richard Barnes
fddabc6e0b C10_UNUSED to [[maybe_unused]] (#6357) (#138364)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-10-19 13:17:43 +00:00
Edward Yang
b14269dcfb Make Context to be Device-agnostic Step by Step (1/N) (#136519) (#138155)
Summary:
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Original pull request: https://github.com/pytorch/pytorch/pull/136519

Test Plan: contbuild & OSS CI, see 4a8e49389c

Reviewed By: malfet

Differential Revision: D64471142

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155
Approved by: https://github.com/malfet, https://github.com/bobrenjc93
2024-10-17 20:58:56 +00:00
Shivam Raikundalia
dfb5ac05cc [Record Function] Add Kwargs only USER_SCOPE Macro (#138020)
Summary: Add a macro such that users can easily add a USER annotation with kwargs only

Test Plan: Will use D63801503 to test this E2E. Added unit test as well that makes sure that the kwargs get recorded correctly

Differential Revision: D64420328

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138020
Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi
2024-10-17 18:48:49 +00:00
fduwjj
7e704c2073 [c10d] Add unit test for CUDAEventCache to ensure caching is working (#138059)
We created a simple test to validate the cache is indeed working and when the cache is indeed used up. I revert the fix in (https://github.com/pytorch/pytorch/pull/138040) and the test indeed failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138059
Approved by: https://github.com/kwen2501
ghstack dependencies: #138040, #138048
2024-10-16 17:34:57 +00:00
Shuqiang Zhang
f4158558aa [c10d] disable watchdog thread in blockingWait mode (#138001)
Summary:
Blocking wait mode is not widely used, probably useful in debugging.
in blockingWait mode, we don't need to enable the watchdog thread to
check the timeout or nccl error because the main thread would throw an
exception if error happens and it is obvious to user which work fails
and its user's responsibility to handle the exception.
Test Plan:
CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138001
Approved by: https://github.com/fduwjj, https://github.com/c-p-i-o
ghstack dependencies: #137799
2024-10-16 07:42:22 +00:00
PyTorch MergeBot
d4d687ffb2 Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)"
This reverts commit 4a8e49389c.

Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))
2024-10-15 17:19:16 +00:00
Richard Barnes
b7f798caa4 Use C10_UNUSED instead of (void)X (#137239)
Summary:
Auto-generated with
```
buck run //scripts/rbarnes/regex_multiline_replacer:regex_multiline_replacer -- --find '^(\s*for\s*\()(const.*\n)\s*\(void\)[A-Za-z]+;\s*//\s*Suppress.*\s*\n(.*)'  --replace '\1C10_UNUSED \2\3' `find caffe2/ -regex ".*\.\(cpp\|h\)"`
```

Differential Revision: D33432600

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137239
Approved by: https://github.com/Skylion007
2024-10-15 14:32:59 +00:00
Catherine Lee
8ac06467d4 Forward fix test (#137910)
Summary: Add back in a deleted file to fix test

It was removed in https://github.com/pytorch/pytorch/pull/137404

Test Plan: `buck2 build --flagfile fbcode//mode/opt fbcode//caffe2/test/cpp/c10d:ProcessGroupGlooAsyncTest` succeeded

Differential Revision: D64341074

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137910
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/kit1980
2024-10-14 22:07:29 +00:00
FFFrog
4a8e49389c Make Context to be Device-agnostic Step by Step (1/N) (#136519)
----

- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey
2024-10-13 12:38:02 +00:00
Nichols A. Romero
bd63ec4f45 [ROCm] LoadHIP CMake cleanup (#137112)
Should help mitigate issues reported here: https://github.com/pytorch/pytorch/issues/128313

While working on https://github.com/pytorch/pytorch/pull/136700, we realized that some of the ROCm CMake can be streamlined.

This PR does not fix any bugs or provide any new functionality. Strictly clean-up.

The remaining `${ROCM_ROCTX_LIB}` will be removed when we transition to the rocprofiler-sdk (to be done in a separate PR).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137112
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily
2024-10-13 00:06:41 +00:00
PyTorch MergeBot
079f909263 Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)"
This reverts commit be0b75256a.

Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))
2024-10-10 18:32:17 +00:00
cyy
94e12f97dc [Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404)
Follows #137072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137404
Approved by: https://github.com/Skylion007
2024-10-10 18:05:34 +00:00
sanshang
249152475d fix sequence number for group (#134578)
Summary:
Fix sequence number in execution trace dump for matching between collective/p2p op and wait in execution trace replay.

`ProcessGroupNCCL` has 2 sequence number counter, `seqCollective_` and `seqP2P_`.
b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L1188-L1191)
However, `WorkNCCL` only has one sequence number member `seq_`. b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L387)
We need to match collective and p2p with wait separately.
29b5a462dc

Depend on: https://github.com/pytorch/pytorch/pull/135132

Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test

Differential Revision:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134578
Approved by: https://github.com/kwen2501, https://github.com/c-p-i-o
2024-10-10 04:24:06 +00:00
Shuqiang Zhang
47a515d260 [c10d] simplify barrier implementation and further decouple CPU/GPU (#137516)
synchronization
Summary:
Barrier is  essentially intended to block CPU thread (instead of GPU
streams). Before we used 2 stream synchronizations (1. current stream
blocked by nccl stream end event, 2. CPU thread blocked on current
stream). This is unnecessary as we already have CPU thread blocking
logic in wait(). Also, adding barrier specific code block in the general
GPU synchronize() API is intrusive and confusing.

This PR cleans this.

Test Plan:
CI

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137516
Approved by: https://github.com/fduwjj, https://github.com/kwen2501
2024-10-09 23:55:28 +00:00
FFFrog
be0b75256a Make Context to be Device-agnostic Step by Step (1/N) (#136519)
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey
2024-10-09 02:13:36 +00:00
cyy
6327a71880 [Environment Variable][2/N] Use thread-safe setenv wrapper (#124485)
This follows #119449 to make setenv thread-safe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124485
Approved by: https://github.com/eqy
2024-10-04 07:30:51 +00:00
abhishek-fujitsu
63d6908da0 fix build error with gcc 12+ (#137092)
Fixes #127920

This commit addresses a build failure occurring with GCC 12 and above due to the -Werror=nonnull flag. The error manifests in the test_api target.

**Issue:**
When building with GCC 12+, the following error occurs:
```
error: argument 1 null where non-null expected [-Werror=nonnull]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This change ensures that:
1. The flag is only added for GCC 12 or higher
2. The flag is only added if it's supported by the compiler
3. The flag is added specifically to the test_api target, not globally

By disabling this specific error, we allow the build to proceed while maintaining other compiler warnings.

**Test Plan:**
- Verified successful build with GCC 12 and above
- Ensured no regression in builds with earlier GCC versions and other compilers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137092
Approved by: https://github.com/malfet
2024-10-02 00:37:15 +00:00
fduwjj
911a43f930 [TCPStore] Remove deprecated constructor (#136004)
While looking at TCPStore code again and found it confusing that we still keep the deprecated constructor for TCPStore in cpp while we don't expose it in python via pybind already. I checked both internal and external, all use cases in cpp (aside from unit test fixed in this PR) already moved to using option. So let's remove this legacy constructor to avoid confusion.

Differential Revision: [D62653634](https://our.internmc.facebook.com/intern/diff/D62653634)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136004
Approved by: https://github.com/Skylion007, https://github.com/XilunWu
2024-09-14 04:25:47 +00:00
PyTorch MergeBot
564d00f364 Revert "Fix clang-tidy warnings in Caffe2 code (#134935)"
This reverts commit 7cfd23636c.

Reverted https://github.com/pytorch/pytorch/pull/134935 on behalf of https://github.com/izaitsevfb due to breaks internal builds, caffe2 is still used internally ([comment](https://github.com/pytorch/pytorch/pull/134935#issuecomment-2349368152))
2024-09-13 16:42:37 +00:00
cyy
7cfd23636c Fix clang-tidy warnings in Caffe2 code (#134935)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134935
Approved by: https://github.com/ezyang
2024-09-12 03:27:09 +00:00
PyTorch MergeBot
c044deb9ce Revert "c10d/logging: add C10D_LOCK_GUARD (#134131)"
This reverts commit f33bcbe5fd.

Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/kit1980 due to See D61985186 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2327556381))
2024-09-03 22:35:14 +00:00
Bin Bao
310eb6d8c6 [AOTI] Fix test_aoti_inference CPU build issue (#134675)
Summary: Fixes https://github.com/pytorch/pytorch/issues/130311. We need to guard CUDA-only code in test_aoti_inference with macros so that it won't fail for CPU-only platform.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134675
Approved by: https://github.com/atalman, https://github.com/chunyuan-w
2024-08-28 17:42:19 +00:00
Tristan Rice
f33bcbe5fd c10d/logging: add C10D_LOCK_GUARD (#134131)
This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s.

This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things.

This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate.

Test plan:

existing CI for regressions

will add unit tests on `C10D_LOCK_GUARD`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131
Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj
2024-08-28 01:40:42 +00:00
PyTorch MergeBot
1c4780e69a Revert "c10d/logging: add C10D_LOCK_GUARD (#134131)"
This reverts commit 4c28a0eb0b.

Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/ZainRizvi due to Sorry but this causes formatting errors internally which make it fail to build. See D61759282 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2310455878))
2024-08-26 15:19:27 +00:00
Sheng Fu
519342962d Pass process group info into NcclWork (#134269)
Summary: Pass process group info into NcclWork

Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test

Differential Revision: D61677160

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134269
Approved by: https://github.com/wconstab
2024-08-24 01:04:43 +00:00
Tristan Rice
4c28a0eb0b c10d/logging: add C10D_LOCK_GUARD (#134131)
This adds logs if we can't acquire locks in NCCLUtils and ProcessGroupNCCL for 30s.

This is motivated by some deadlocks were seeing and it's unclear if it's in NCCL or on the PyTorch side of things.

This required replacing most `std::mutex` with `std::timed_mutex` and `std::condition_variable_any` as appropriate.

Test plan:

existing CI for regressions

will add unit tests on `C10D_LOCK_GUARD`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134131
Approved by: https://github.com/c-p-i-o, https://github.com/fduwjj
2024-08-24 00:27:39 +00:00
Xuehai Pan
758a0a88a2 [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200)
This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change.

Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980
2024-08-15 15:50:19 +00:00
cyy
c2eeda5da0 [structural binding][12/N] Replace std::tie with structural binding (#131031)
Follows #130830
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131031
Approved by: https://github.com/ezyang
2024-08-14 00:51:34 +00:00
Chirag Pandya
40767e8468 [BE] rename testHelperPrefix test (#132916)
Summary:
Re-enable testHelperPrefix test that was erroneously disabled in CI.
Fixes #50701

Test Plan:
Test passes locally:
```
❯ ./TCPStoreTest --gtest_filter=TCPStoreTest.testHelperPrefix
Running main() from
/data/users/cpio/pytorch/third_party/googletest/googletest/src/gtest_main.cc
Note: Google Test filter = TCPStoreTest.testHelperPrefix
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TCPStoreTest
[ RUN      ] TCPStoreTest.testHelperPrefix
[W807 12:01:31.531576727 socket.cpp:462] [c10d] waitForInput: poll for
socket SocketImpl(fd=6, addr=[localhost]:37984,
remote=[localhost]:37171) returned 0, likely a timeout
[W807 12:01:31.531663710 socket.cpp:487] [c10d] waitForInput: socket
SocketImpl(fd=6, addr=[localhost]:37984, remote=[localhost]:37171) timed
out after 100ms
[       OK ] TCPStoreTest.testHelperPrefix (314 ms)
[----------] 1 test from TCPStoreTest (314 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (314 ms total)
[  PASSED  ] 1 test.
╭─ ~/local/pytorch/build/bin  main *1 +1 ···················· ✔
/home/cpio/local/a/pytorch-env   cpio@devgpu011 ─╮
╰─
```
Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132916
Approved by: https://github.com/Skylion007
2024-08-08 20:54:52 +00:00
Prachi Gupta
c326533999 [ROCm][Inductor] Enable AOT Inductor CPP UTs for ROCm (#131521)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131521
Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/malfet
2024-08-08 19:49:56 +00:00
Shangdi Yu
21906ddaba [AOTI] Fix complex64 not defined (#132810)
Partially fixes #122980

- change cpp type mapping for complex64 to std::complex<float>
- add `aoti_torch_item_complex64` and `aoti_torch_scalar_to_tensor_complex64`.
- add `expensiveCopyToTensor()` to convert `ArrayRefTensor<T>` type to `AtenTensorHandle` type.

- if we want to fully fix #122980, we still need to let ArrayRef and MiniArrayRef to consider underlying storage number of elements. See more details in https://github.com/pytorch/pytorch/pull/132347 (#132347  broke some internal tests, so we need more work before landing it).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132810
Approved by: https://github.com/desertfire
2024-08-08 18:08:23 +00:00
Matthew Hoffman
258f47fc0b Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884)
Fixes #10536

Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work!

As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral.

The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)), which requires only a very small change in the C++ code.

Here are the benchmarks of the new implementation!

`float32`:

![eaaa95ef-9384-45d2-be56-6898bc1d3514](https://github.com/user-attachments/assets/3b0eb309-e5a0-4a4d-97bb-4e3298783dbb)

`bool`:

![892f32da-8d9a-492b-9507-18d3f0a41e8e](https://github.com/user-attachments/assets/6824ea15-7d4e-4b89-95f0-8546635f0c2e)

Code:

```python
from __future__ import annotations

import random
import time
from typing import Literal

import numpy as np
import torch

def pad_sequence_with_flips(
    sequences: list[torch.Tensor],
    batch_first: bool = False,
    padding_value: int | float | bool = 0.0,
    padding_side: Literal["left", "right"] | str = "left",
) -> torch.Tensor:
    if padding_side == 'right':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value)
    elif padding_side=='left':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value)  # pyright: ignore[reportArgumentType]
        padded_sequence = padded_sequence.flip(int(batch_first))
    else:
        raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}")

    return padded_sequence

sequence_lengths: list[int] = []

flip_left_pad_times: list[float] = []
flip_left_pad_times_std: list[float] = []

left_pad_times: list[float] = []
left_pad_times_std: list[float] = []

RUNS_PER_LOOP: int = 100

for i in range(1, 7):
    sequence_length = i * int(1e6) // 6
    sequence_lengths.append(sequence_length)

    sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)]

    inner_left_pad_times: list[float] = []
    inner_right_pad_times: list[float] = []

    inner_flip_left_pad_times: list[float] = []
    inner_flip_right_pad_times: list[float] = []

    for _ in range(RUNS_PER_LOOP):

        start = time.perf_counter()
        torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_left_pad_times.append(end - start)

        start = time.perf_counter()
        pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_flip_left_pad_times.append(end - start)

    left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times))
    left_pad_times_std.append(np.std(inner_left_pad_times))

    flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times))
    flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times))

    print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}")

import matplotlib.pyplot as plt

plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left")
plt.scatter(sequence_lengths, left_pad_times)
plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^')

plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)")
plt.scatter(sequence_lengths, flip_left_pad_times)
plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^')

plt.xlabel("Sequence Length")
plt.ylabel("Time (s)")
plt.legend(loc="upper right")

# Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794
# Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562
# Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863
# Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877
# Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024
# Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751
```

Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884
Approved by: https://github.com/ezyang
2024-08-07 15:53:07 +00:00
Ivan Zaitsev
841cadd555 Fix discrepancies from 129973 (#132545)
#129973 ([D59132793](https://www.internalfb.com/diff/D59132793)) was exported missing changes in `test/cpp/jit/CMakeLists.txt` this PR remediates that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132545
Approved by: https://github.com/kit1980
2024-08-03 01:57:49 +00:00
Oguz Ulgen
221350e3a4 Add None return type to init -- tests (#132352)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352
Approved by: https://github.com/ezyang
ghstack dependencies: #132335, #132351
2024-08-01 15:44:51 +00:00
Xuehai Pan
548c460bf1 [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758
Approved by: https://github.com/ezyang
2024-07-31 10:54:03 +00:00
cyy
89da94594e [11/N] Fix clang-tidy warnings in jit (#132131)
Follows #132122

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132131
Approved by: https://github.com/Skylion007
2024-07-31 03:45:52 +00:00
cyy
73d0f484b3 [structural binding][11/N] Replace std::tie with structural binding (#130830)
Follows  #130784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130830
Approved by: https://github.com/janeyx99
2024-07-18 00:45:06 +00:00
cyy
168e41009b [structural binding][10/N] Replace std::tie with structural binding (#130784)
Follows  #130404

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130784
Approved by: https://github.com/malfet
2024-07-16 10:28:14 +00:00