mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
This PR fixes typos in `.md` files under benchmarks, test, and tools directories Pull Request resolved: https://github.com/pytorch/pytorch/pull/87975 Approved by: https://github.com/kit1980
143 lines
4.5 KiB
Markdown
143 lines
4.5 KiB
Markdown
# Instruction count microbenchmarks
|
|
## Quick start
|
|
|
|
### To run the benchmark:
|
|
|
|
```
|
|
# From pytorch root
|
|
cd benchmarks/instruction_counts
|
|
python main.py
|
|
```
|
|
|
|
Currently `main.py` contains a very simple threadpool (so that run time isn't
|
|
unbearably onerous) and simply prints the results. These components will be
|
|
upgraded in subsequent PRs.
|
|
|
|
### To define a new benchmark:
|
|
* `TimerArgs`: Low level definition which maps directly to
|
|
`torch.utils.benchmark.Timer`
|
|
* `GroupedStmts`: Benchmark a snippet. (Python, C++, or both) Can automatically
|
|
generate TorchScript and autograd variants.
|
|
* `GroupedModules`: Like `GroupedStmts`, but takes `nn.Module`s
|
|
* `GroupedVariants`: Benchmark-per-line to define many related benchmarks in a
|
|
single code block.
|
|
|
|
## Architecture
|
|
### Benchmark definition.
|
|
|
|
One primary goal of this suite is to make it easy to define semantically
|
|
related clusters of benchmarks. The crux of this effort is the
|
|
`GroupedBenchmark` class, which is defined in `core/api.py`. It takes a
|
|
definition for a set of related benchmarks, and produces one or more concrete
|
|
cases. It's helpful to see an example to understand how the machinery works.
|
|
Consider the following benchmark:
|
|
|
|
```
|
|
# `GroupedStmts` is an alias of `GroupedBenchmark.init_from_stmts`
|
|
benchmark = GroupedStmts(
|
|
py_stmt=r"y = x * w",
|
|
cpp_stmt=r"auto y = x * w;",
|
|
|
|
setup=GroupedSetup(
|
|
py_setup="""
|
|
x = torch.ones((4, 4))
|
|
w = torch.ones((4, 4), requires_grad=True)
|
|
""",
|
|
cpp_setup="""
|
|
auto x = torch::ones((4, 4));
|
|
auto w = torch::ones((4, 4));
|
|
w.set_requires_grad(true);
|
|
""",
|
|
),
|
|
|
|
signature="f(x, w) -> y",
|
|
torchscript=True,
|
|
autograd=True,
|
|
),
|
|
```
|
|
|
|
It is trivial to generate Timers for the eager forward mode case (ignoring
|
|
`num_threads` for now):
|
|
|
|
```
|
|
Timer(
|
|
stmt=benchmark.py_fwd_stmt,
|
|
setup=benchmark.setup.py_setup,
|
|
)
|
|
|
|
Timer(
|
|
stmt=benchmark.cpp_fwd_stmt,
|
|
setup=benchmark.setup.cpp_setup,
|
|
language="cpp",
|
|
)
|
|
```
|
|
|
|
Moreover, because `signature` is provided we know that creation of `x` and `w`
|
|
is part of setup, and the overall computation uses `x` and `w` to produce `y`.
|
|
As a result, we can derive TorchScript'd and AutoGrad variants as well. We can
|
|
deduce that a TorchScript model will take the form:
|
|
|
|
```
|
|
@torch.jit.script
|
|
def f(x, w):
|
|
# Paste `benchmark.py_fwd_stmt` into the function body.
|
|
y = x * w
|
|
return y # Set by `-> y` in signature.
|
|
```
|
|
|
|
And because we will want to use this model in both Python and C++, we save it to
|
|
disk and load it as needed. At this point Timers for TorchScript become:
|
|
|
|
```
|
|
Timer(
|
|
stmt="""
|
|
y = jit_model(x, w)
|
|
""",
|
|
setup=""",
|
|
# benchmark.setup.py_setup
|
|
# jit_model = torch.jit.load(...)
|
|
# Warm up jit_model
|
|
""",
|
|
)
|
|
|
|
Timer(
|
|
stmt="""
|
|
std::vector<torch::jit::IValue> ivalue_inputs(
|
|
torch::jit::IValue({x}),
|
|
torch::jit::IValue({w})
|
|
);
|
|
auto y = jit_model.forward(ivalue_inputs);
|
|
""",
|
|
setup="""
|
|
# benchmark.setup.cpp_setup
|
|
# jit_model = torch::jit::load(...)
|
|
# Warm up jit_model
|
|
""",
|
|
)
|
|
```
|
|
|
|
While nothing above is particularly complex, there is non-trivial bookkeeping
|
|
(managing the model artifact, setting up IValues) which if done manually would
|
|
be rather bug-prone and hard to read.
|
|
|
|
The story is similar for autograd: because we know the output variable (`y`)
|
|
and we make sure to assign it when calling TorchScript models, testing AutoGrad
|
|
is as simple as appending `y.backward()` (or `y.backward();` in C++) to the
|
|
stmt of the forward only variant. Of course this requires that `signature` be
|
|
provided, as there is nothing special about the name `y`.
|
|
|
|
The logic for the manipulations above is split between `core/api.py` (for
|
|
generating `stmt` based on language, Eager/TorchScript, with or without AutoGrad)
|
|
and `core/expand.py` (for larger, more expansive generation). The benchmarks
|
|
themselves are defined in `definitions/standard.py`. The current set is chosen
|
|
to demonstrate the various model definition APIs, and will be expanded when the
|
|
benchmark runner infrastructure is better equipped to deal with a larger run.
|
|
|
|
### Benchmark execution.
|
|
|
|
Once `expand.materialize` has flattened the abstract benchmark definitions into
|
|
`TimerArgs`, they can be sent to a worker (`worker/main.py`) subprocess to
|
|
execution. This worker has no concept of the larger benchmark suite; `TimerArgs`
|
|
is a one-to-one and direct mapping to the `torch.utils.benchmark.Timer` instance
|
|
that the worker instantiates.
|