mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
The link is broken in https://pytorch.org/docs/main/community/design.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/120972 Approved by: https://github.com/Skylion007
158 lines
8.0 KiB
ReStructuredText
158 lines
8.0 KiB
ReStructuredText
PyTorch Design Philosophy
|
||
=========================
|
||
|
||
This document is designed to help contributors and module maintainers
|
||
understand the high-level design principles that have developed over
|
||
time in PyTorch. These are not meant to be hard-and-fast rules, but to
|
||
serve as a guide to help trade off different concerns and to resolve
|
||
disagreements that may come up while developing PyTorch. For more
|
||
information on contributing, module maintainership, and how to escalate a
|
||
disagreement to the Core Maintainers, please see `PyTorch
|
||
Governance <https://pytorch.org/docs/main/community/governance.html>`__.
|
||
|
||
Design Principles
|
||
-----------------
|
||
|
||
Principle 1: Usability over Performance
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
This principle may be surprising! As one Hacker News poster wrote:
|
||
*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be
|
||
not obsessed with speed/performance?* See `Hacker News discussion on
|
||
PyTorch <https://news.ycombinator.com/item?id=28066093>`__.
|
||
|
||
Soumith’s blog post on `Growing the PyTorch
|
||
Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__
|
||
goes into this in some depth, but at a high-level:
|
||
|
||
- PyTorch’s primary goal is usability
|
||
- A secondary goal is to have *reasonable* performance
|
||
|
||
We believe the ability to maintain our flexibility to support
|
||
researchers who are building on top of our abstractions remains
|
||
critical. We can’t see what the future of what workloads will be, but we
|
||
know we want them to be built first on PyTorch and that requires
|
||
flexibility.
|
||
|
||
In more concrete terms, we operate in a *usability-first* manner and try
|
||
to avoid jumping to *restriction-first* regimes (for example, static shapes,
|
||
graph-mode only) without a clear-eyed view of the tradeoffs. Often there
|
||
is a temptation to impose strict user restrictions upfront because it
|
||
can simplify implementation, but this comes with risks:
|
||
|
||
- The performance may not be worth the user friction, either because
|
||
the performance benefit is not compelling enough or it only applies to
|
||
a relatively narrow set of subproblems.
|
||
- Even if the performance benefit is compelling, the restrictions can
|
||
fragment the ecosystem into different sets of limitations that can
|
||
quickly become incomprehensible to users.
|
||
|
||
We want users to be able to seamlessly move their PyTorch code to
|
||
different hardware and software platforms, to interoperate with
|
||
different libraries and frameworks, and to experience the full richness
|
||
of the PyTorch user experience, not a least common denominator subset.
|
||
|
||
Principle 2: Simple Over Easy
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Here, we borrow from `The Zen of
|
||
Python <https://peps.python.org/pep-0020/>`__:
|
||
|
||
- *Explicit is better than implicit*
|
||
- *Simple is better than complex*
|
||
|
||
A more concise way of describing these two goals is `Simple Over
|
||
Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are
|
||
often used interchangeably in everyday English. Consider how one may
|
||
model `devices <https://pytorch.org/docs/main/tensor_attributes.html#torch.device>`__
|
||
in PyTorch:
|
||
|
||
- **Simple / Explicit (to understand, debug):** every tensor is associated
|
||
with a device. The user explicitly specifies tensor device movement.
|
||
Operations that require cross-device movement result in an error.
|
||
- **Easy / Implicit (to use):** the user does not have to worry about
|
||
devices; the system figures out the globally optimal device
|
||
placement.
|
||
|
||
In this specific case, and as a general design philosophy, PyTorch
|
||
favors exposing simple and explicit building blocks rather than APIs
|
||
that are easy-to-use by practitioners. The simple version is immediately
|
||
understandable and debuggable by a new PyTorch user: you get a clear
|
||
error if you call an operator requiring cross-device movement at the
|
||
point in the program where the operator is actually invoked. The easy
|
||
solution may let a new user move faster initially, but debugging such a
|
||
system can be complex: How did the system make its determination? What
|
||
is the API for plugging into such a system and how are objects
|
||
represented in its IR?
|
||
|
||
Some classic arguments in favor of this sort of design come from `A
|
||
Note on Distributed
|
||
Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not
|
||
model resources with very different performance characteristics
|
||
uniformly, the details will leak) and the `End-to-End
|
||
Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__
|
||
(TLDR: building smarts into the lower-layers of the stack can prevent
|
||
building performant features at higher layers in the stack, and often
|
||
doesn’t work anyway). For example, we could build operator-level or
|
||
global device movement rules, but the precise choices aren’t obvious and
|
||
building an extensible mechanism has unavoidable complexity and latency
|
||
costs.
|
||
|
||
A caveat here is that this does not mean that higher-level “easy” APIs
|
||
are not valuable; certainly there is a value in, for example,
|
||
higher-levels in the stack to support efficient tensor computations
|
||
across heterogeneous compute in a large cluster. Instead, what we mean
|
||
is that focusing on simple lower-level building blocks helps inform the
|
||
easy API while still maintaining a good experience when users need to
|
||
leave the beaten path. It also allows space for innovation and the
|
||
growth of more opinionated tools at a rate we cannot support in the
|
||
PyTorch core library, but ultimately benefit from, as evidenced by
|
||
our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other
|
||
words, not automating at the start allows us to potentially reach levels
|
||
of good automation faster.
|
||
|
||
Principle 3: Python First with Best In Class Language Interoperability
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
This principle began as **Python First**:
|
||
|
||
PyTorch is not a Python binding into a monolithic C++ framework.
|
||
It is built to be deeply integrated into Python. You can use it
|
||
naturally like you would use `NumPy <https://www.numpy.org/>`__,
|
||
`SciPy <https://www.scipy.org/>`__, `scikit-learn <https://scikit-learn.org/>`__,
|
||
or other Python libraries. You can write your new neural network
|
||
layers in Python itself, using your favorite libraries and use
|
||
packages such as `Cython <https://cython.org/>`__ and
|
||
`Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent
|
||
the wheel where appropriate.
|
||
|
||
One thing PyTorch has needed to deal with over the years is Python
|
||
overhead: we first rewrote the `autograd` engine in C++, then the majority
|
||
of operator definitions, then developed TorchScript and the C++
|
||
frontend.
|
||
|
||
Still, working in Python provides easily the best experience for our
|
||
users: it is flexible, familiar, and perhaps most importantly, has a
|
||
huge ecosystem of scientific computing libraries and extensions
|
||
available for use. This fact motivates a few of our most recent
|
||
contributions, which attempt to hit a Pareto optimal point close to the
|
||
Python usability end of the curve:
|
||
|
||
- `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__,
|
||
a Python frame evaluation tool capable of speeding up existing
|
||
eager-mode PyTorch programs with minimal user intervention.
|
||
- `torch_function <https://pytorch.org/docs/main/notes/extending.html#extending-torch>`__
|
||
and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__
|
||
extension points, which have enabled Python-first functionality to be
|
||
built on-top of C++ internals, such as the `torch.fx
|
||
tracer <https://pytorch.org/docs/stable/fx.html>`__
|
||
and `functorch <https://github.com/pytorch/functorch>`__
|
||
respectively.
|
||
|
||
These design principles are not hard-and-fast rules, but hard won
|
||
choices and anchor how we built PyTorch to be the debuggable, hackable
|
||
and flexible framework it is today. As we have more contributors and
|
||
maintainers, we look forward to applying these core principles with you
|
||
across our libraries and ecosystem. We are also open to evolving them as
|
||
we learn new things and the AI space evolves, as we know it will.
|