PyTorch Data Sampler benchmark (#156974)

## Motivation Many PRs optimizing samplers (for eg https://github.com/pytorch/pytorch/pull/147706, https://github.com/pytorch/pytorch/pull/137423) are leveraging an adhoc script for benchmarking samplers. The script and outputs are often copied over in PRs. We want to begin centralizing benchmarks for torch.utils.data components. ## What ? * This PR adds a new sub-folder in `benchmarks` for `data`. This is aimed to cover benchmarking scripts for torch.utils.data components like dataloader and sampler. * Specifically, this PR includes a simple script to time samplers. This is often "copy-pasted" in PRs optimizing samplers. Having it in a centralized location should prevent that, and allow a common standard. ## Output ``` Benchmark Results: +--------------+-------------+----------------+-----------+-----------+ | Batch Size | Drop Last | Original (s) | New (s) | Speedup | +==============+=============+================+===========+===========+ | 4 | True | 0.004 | 0.0088 | -119.62% | +--------------+-------------+----------------+-----------+-----------+ | 4 | False | 0.0083 | 0.009 | -9.23% | +--------------+-------------+----------------+-----------+-----------+ | 8 | True | 0.003 | 0.0074 | -147.64% | +--------------+-------------+----------------+-----------+-----------+ | 8 | False | 0.0054 | 0.0075 | -38.72% | +--------------+-------------+----------------+-----------+-----------+ | 64 | True | 0.0021 | 0.0056 | -161.92% | +--------------+-------------+----------------+-----------+-----------+ | 64 | False | 0.0029 | 0.0055 | -92.50% | +--------------+-------------+----------------+-----------+-----------+ | 640 | True | 0.002 | 0.0055 | -168.75% | +--------------+-------------+----------------+-----------+-----------+ | 640 | False | 0.0024 | 0.0062 | -161.35% | +--------------+-------------+----------------+-----------+-----------+ | 6400 | True | 0.0021 | 0.0055 | -160.13% | +--------------+-------------+----------------+-----------+-----------+ | 6400 | False | 0.0021 | 0.0068 | -215.46% | +--------------+-------------+----------------+-----------+-----------+ | 64000 | True | 0.0042 | 0.0065 | -55.29% | +--------------+-------------+----------------+-----------+-----------+ | 64000 | False | 0.0029 | 0.0077 | -169.56% | +--------------+-------------+----------------+-----------+-----------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156974 Approved by: https://github.com/ramanishsingh
2025-12-06 12:20:52 +01:00 · 2025-06-27 04:49:39 +00:00 · 2025-06-27 04:49:39 +00:00 · e6d8ed02cb
commit e6d8ed02cb
parent 195ef1bce8
4 changed files with 212 additions and 1 deletions
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@ -31,3 +31,4 @@ Please refer to each subfolder to discover each benchmark suite. Links are provi
 * [Overrides](overrides_benchmark/README.md)
 * [Sparse](sparse/README.md)
 * [Tensor expression](tensorexpr/HowToRun.md)
+* [Data](data/README.md)
--- a/benchmarks/data/README.md
+++ b/benchmarks/data/README.md
@ -0,0 +1,62 @@
+# PyTorch Data Benchmarks
+
+This directory contains benchmarks for the `torch.utils.data` module components, focusing on the performance of samplers.
+
+## Dependencies
+
+The benchmarks require the following dependencies:
+```
+numpy
+tabulate
+```
+
+You can install them using pip:
+```bash
+pip install numpy tabulate
+```
+
+## Running the benchmarks
+
+To run the BatchSampler benchmark:
+```bash
+python samplers_benchmark.py
+```
+
+## Sampler Benchmark
+
+The `samplers_benchmark.py` script benchmarks the performance of PyTorch's BatchSampler against an alternative implementation as an example. It tests with the following parameters:
+
+- Batch sizes: 4, 8, 64, 640, 6400, 64000
+- Drop last options: True, False
+- Each configuration is run 10 times and averaged
+- Results include speedup percentage calculations
+
+### Output
+
+The benchmark outputs a table with the following columns:
+- Batch Size
+- Drop Last
+- Original (s): Time taken by the original implementation
+- New (s): Time taken by the alternative implementation
+- Speedup: Percentage improvement of the new implementation over the original
+
+Example output:
+```
+------------+-----------+---------------+----------+---------+
+| Batch Size | Drop Last | Original (s)  | New (s)  | Speedup |
+============+===========+===============+==========+=========+
+|          4 | True      | 0.1234        | 0.1000   | 18.96%  |
+------------+-----------+---------------+----------+---------+
+|          4 | False     | 0.1345        | 0.1100   | 18.22%  |
+------------+-----------+---------------+----------+---------+
+...
+```
+
+### Extending the Benchmark
+
+To benchmark a different implementation:
+
+On local:
+1. Modify the `NewBatchSampler` class in `samplers_benchmark.py` with your implementation. Similarly replace `BatchSampler` with the corresponding PyTorch implementation.
+    * Ensure to include all inputs like `replacement` for `RandomSampler` and its variations
+2. Run the benchmark to compare its performance against the original
--- a/benchmarks/data/samplers_benchmark.py
+++ b/benchmarks/data/samplers_benchmark.py
@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+
+import time
+from collections.abc import Iterable, Iterator
+from typing import Union
+
+import numpy as np
+from tabulate import tabulate
+
+from torch.utils.data import BatchSampler, Sampler, SequentialSampler
+
+
+class NewBatchSampler(Sampler[list[int]]):
+    """Alternative implementation of BatchSampler for benchmarking purposes."""
+
+    def __init__(
+        self,
+        sampler: Union[Sampler[int], Iterable[int]],
+        batch_size: int,
+        drop_last: bool,
+    ) -> None:
+        if (
+            not isinstance(batch_size, int)
+            or isinstance(batch_size, bool)
+            or batch_size <= 0
+        ):
+            raise ValueError(
+                f"batch_size should be a positive integer value, but got batch_size={batch_size}"
+            )
+        if not isinstance(drop_last, bool):
+            raise ValueError(
+                f"drop_last should be a boolean value, but got drop_last={drop_last}"
+            )
+        self.sampler = sampler
+        self.batch_size = batch_size
+        self.drop_last = drop_last
+
+    def __iter__(self) -> Iterator[list[int]]:
+        if self.drop_last:
+            sampler_iter = iter(self.sampler)
+            while True:
+                try:
+                    batch = [next(sampler_iter) for _ in range(self.batch_size)]
+                    yield batch
+                except StopIteration:
+                    break
+        else:
+            batch = [0] * self.batch_size
+            idx_in_batch = 0
+            for idx in self.sampler:
+                batch[idx_in_batch] = idx
+                idx_in_batch += 1
+                if idx_in_batch == self.batch_size:
+                    yield batch
+                    idx_in_batch = 0
+                    batch = [0] * self.batch_size
+            if idx_in_batch > 0:
+                yield batch[:idx_in_batch]
+
+    def __len__(self) -> int:
+        # Can only be called if self.sampler has __len__ implemented
+        if self.drop_last:
+            return len(self.sampler) // self.batch_size  # type: ignore[arg-type]
+        else:
+            return (len(self.sampler) + self.batch_size - 1) // self.batch_size  # type: ignore[arg-type]
+
+
+def main():
+    """Run benchmark with specified parameters."""
+    DATA_SIZE = 99999
+    AVG_TIMES = 10
+    BATCH_SIZES = [4, 8, 64, 640, 6400, 64000]
+    DROP_LAST_OPTIONS = [True, False]
+
+    results = []
+
+    # Set up samplers here, ensure right args are passed in
+    baselineSampler = BatchSampler
+    testSampler = NewBatchSampler
+
+    for batch_size in BATCH_SIZES:
+        for drop_last in DROP_LAST_OPTIONS:
+            print(f"Benchmarking with batch_size={batch_size}, drop_last={drop_last}")
+
+            # Benchmark baselineSampler
+            original_times = []
+            for _ in range(AVG_TIMES):
+                start = time.perf_counter()
+                for _ in baselineSampler(
+                    sampler=SequentialSampler(range(DATA_SIZE)),
+                    batch_size=batch_size,
+                    drop_last=drop_last,
+                ):
+                    pass
+                end = time.perf_counter()
+                original_times.append(end - start)
+                time.sleep(0.1)
+
+            original_avg = float(np.mean(original_times))
+
+            # Benchmark testSampler
+            new_times = []
+            for _ in range(AVG_TIMES):
+                start = time.perf_counter()
+                for _ in testSampler(
+                    sampler=SequentialSampler(range(DATA_SIZE)),
+                    batch_size=batch_size,
+                    drop_last=drop_last,
+                ):
+                    pass
+                end = time.perf_counter()
+                new_times.append(end - start)
+                time.sleep(0.1)  # Small delay to reduce system load
+
+            new_avg = float(np.mean(new_times))
+
+            # Calculate speedup
+            if original_avg > 0 and new_avg > 0:
+                speedup = (original_avg - new_avg) / original_avg * 100
+                speedup_str = f"{speedup:.2f}%"
+            else:
+                speedup_str = "N/A"
+
+            print(f"Speedup: {speedup_str}\n")
+
+            results.append(
+                [
+                    batch_size,
+                    drop_last,
+                    f"{original_avg:.4f}",
+                    f"{new_avg:.4f}",
+                    speedup_str,
+                ]
+            )
+
+    # Print results in a table
+    headers = ["Batch Size", "Drop Last", "Original (s)", "New (s)", "Speedup"]
+    print("\nBenchmark Results:")
+    print(tabulate(results, headers=headers, tablefmt="grid"))
+
+
+if __name__ == "__main__":
+    main()
--- a/torch/utils/data/sampler.py
+++ b/torch/utils/data/sampler.py
@ -6,6 +6,12 @@ from typing import Generic, Optional, TypeVar, Union
 import torch


+# Note: For benchmarking changes to samplers, see:
+# /benchmarks/data/samplers_bench.py
+# This benchmark compares the performance of different sampler implementations
+# and can be used to evaluate the impact of optimizations.
+
+
 __all__ = [
    "BatchSampler",
    "RandomSampler",
@ -324,7 +330,6 @@ class BatchSampler(Sampler[list[int]]):
        self.drop_last = drop_last

    def __iter__(self) -> Iterator[list[int]]:
-        # Implemented based on the benchmarking in https://github.com/pytorch/pytorch/pull/76951
        sampler_iter = iter(self.sampler)
        if self.drop_last:
            # Create multiple references to the same iterator