Document limitations of weights_only in SECURITY.md and torch.load doc (#165645)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165645
Approved by: https://github.com/albanD
This commit is contained in:
Mikayla Gawarecki 2025-10-27 09:05:10 -07:00 committed by PyTorch MergeBot
parent 3f69b4d9b4
commit 6ecd6b23b6
3 changed files with 26 additions and 9 deletions

View File

@ -31,9 +31,9 @@ Be careful when running untrusted models. This classification includes models cr
**Prefer to execute untrusted models within a secure, isolated environment such as a sandbox** (e.g., containers, virtual machines). This helps protect your system from potentially malicious code. You can find further details and instructions in [this page](https://developers.google.com/code-sandboxing).
**Be mindful of risky model formats**. Give preference to share and load weights with the appropriate format for your use case. [safetensors](https://huggingface.co/docs/safetensors/en/index) gives the most safety but is the most restricted in what it supports. [`torch.load`](https://pytorch.org/docs/stable/generated/torch.load.html#torch.load) with `weights_only=True` is also secure to our knowledge even though it offers significantly larger surface of attack. Loading un-trusted checkpoint with `weights_only=False` MUST never be done.
**Be mindful of risky model formats**. Give preference to share and load weights with the appropriate format for your use case. [safetensors](https://huggingface.co/docs/safetensors/en/index) gives the most safety but is the most restricted in what it supports. [`torch.load`](https://pytorch.org/docs/stable/generated/torch.load.html#torch.load) has a significantly larger surface of attack but is more flexible in what it can serialize. See the documentation for more details.
Even for more secure serialization formats, unexpected inputs to the downstream system can cause diverse security threats (e.g. denial of service, out of bound reads/writes) and thus we recommend extensive validation of any untrusted inputs.
Important Note: The trustworthiness of a model is not binary. You must always determine the proper level of caution depending on the specific model and how it matches your use case and risk tolerance.

View File

@ -263,12 +263,31 @@ offers a comprehensive example of using these features to manipulate a checkpoin
Starting in version 2.6, ``torch.load`` will use ``weights_only=True`` if the ``pickle_module``
argument is not passed.
.. _weights-only-security:
weights_only security
^^^^^^^^^^^^^^^^^^^^^
As discussed in the documentation for :func:`torch.load`, ``weights_only=True`` restricts
the unpickler used in ``torch.load`` to only executing functions/building classes required for
``state_dicts`` of plain ``torch.Tensors`` as well as some other primitive types. Further,
unlike the default ``Unpickler`` provided by the ``pickle`` module, the ``weights_only`` Unpickler
is not allowed to dynamically import anything during unpickling.
``weights_only=True`` narrows the surface of remote code execution attacks but has the following limitations:
1. ``weights_only=True`` does not guard against denial of service attacks.
2. We try to prevent memory corruptions during ``torch.load(weights_only=True)`` but they might still be possible.
Note that even if memory corruption does not occur during ``torch.load`` itself, loading CAN create
unexpected objects for the downstream code that can also lead to memory corruption (e.g. a Tensor of
indices and values made to a sparse Tensor in user code might write/read out of bounds).
.. _weights-only-allowlist:
weights_only allowlist
^^^^^^^^^^^^^^^^^^^^^^
As mentioned above, saving a module's ``state_dict`` is a best practice when using ``torch.save``. If loading an old
checkpoint that contains an ``nn.Module``, we recommend ``weights_only=False``. When loading a checkpoint that contains
tensor subclasses, there will likely be functions/classes that need to be allowlisted, see below for further details.

View File

@ -1304,6 +1304,11 @@ def load(
Loads an object saved with :func:`torch.save` from a file.
.. warning::
:func:`torch.load()` uses an unpickler under the hood. **Never load data from an untrusted source.**
See :ref:`weights-only-security` for more details.
:func:`torch.load` uses Python's unpickling facilities but treats storages,
which underlie tensors, specially. They are first deserialized on the
CPU and are then moved to the device they were saved from. If this fails
@ -1356,13 +1361,6 @@ def load(
:func:`pickle_module.load` and :func:`pickle_module.Unpickler`, e.g.,
:attr:`errors=...`.
.. warning::
:func:`torch.load()` unless `weights_only` parameter is set to `True`,
uses ``pickle`` module implicitly, which is known to be insecure.
It is possible to construct malicious pickle data which will execute arbitrary code
during unpickling. Never load data that could have come from an untrusted
source in an unsafe mode, or that could have been tampered with. **Only load data you trust**.
.. note::
When you call :func:`torch.load()` on a file which contains GPU tensors, those tensors
will be loaded to GPU by default. You can call ``torch.load(.., map_location='cpu')``