mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 00:20:18 +01:00
Document limitations of weights_only in SECURITY.md and torch.load doc (#165645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165645 Approved by: https://github.com/albanD
This commit is contained in:
parent
3f69b4d9b4
commit
6ecd6b23b6
|
|
@ -31,9 +31,9 @@ Be careful when running untrusted models. This classification includes models cr
|
|||
|
||||
**Prefer to execute untrusted models within a secure, isolated environment such as a sandbox** (e.g., containers, virtual machines). This helps protect your system from potentially malicious code. You can find further details and instructions in [this page](https://developers.google.com/code-sandboxing).
|
||||
|
||||
**Be mindful of risky model formats**. Give preference to share and load weights with the appropriate format for your use case. [safetensors](https://huggingface.co/docs/safetensors/en/index) gives the most safety but is the most restricted in what it supports. [`torch.load`](https://pytorch.org/docs/stable/generated/torch.load.html#torch.load) with `weights_only=True` is also secure to our knowledge even though it offers significantly larger surface of attack. Loading un-trusted checkpoint with `weights_only=False` MUST never be done.
|
||||
|
||||
**Be mindful of risky model formats**. Give preference to share and load weights with the appropriate format for your use case. [safetensors](https://huggingface.co/docs/safetensors/en/index) gives the most safety but is the most restricted in what it supports. [`torch.load`](https://pytorch.org/docs/stable/generated/torch.load.html#torch.load) has a significantly larger surface of attack but is more flexible in what it can serialize. See the documentation for more details.
|
||||
|
||||
Even for more secure serialization formats, unexpected inputs to the downstream system can cause diverse security threats (e.g. denial of service, out of bound reads/writes) and thus we recommend extensive validation of any untrusted inputs.
|
||||
|
||||
Important Note: The trustworthiness of a model is not binary. You must always determine the proper level of caution depending on the specific model and how it matches your use case and risk tolerance.
|
||||
|
||||
|
|
|
|||
|
|
@ -263,12 +263,31 @@ offers a comprehensive example of using these features to manipulate a checkpoin
|
|||
Starting in version 2.6, ``torch.load`` will use ``weights_only=True`` if the ``pickle_module``
|
||||
argument is not passed.
|
||||
|
||||
.. _weights-only-security:
|
||||
|
||||
weights_only security
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As discussed in the documentation for :func:`torch.load`, ``weights_only=True`` restricts
|
||||
the unpickler used in ``torch.load`` to only executing functions/building classes required for
|
||||
``state_dicts`` of plain ``torch.Tensors`` as well as some other primitive types. Further,
|
||||
unlike the default ``Unpickler`` provided by the ``pickle`` module, the ``weights_only`` Unpickler
|
||||
is not allowed to dynamically import anything during unpickling.
|
||||
|
||||
``weights_only=True`` narrows the surface of remote code execution attacks but has the following limitations:
|
||||
|
||||
1. ``weights_only=True`` does not guard against denial of service attacks.
|
||||
2. We try to prevent memory corruptions during ``torch.load(weights_only=True)`` but they might still be possible.
|
||||
|
||||
Note that even if memory corruption does not occur during ``torch.load`` itself, loading CAN create
|
||||
unexpected objects for the downstream code that can also lead to memory corruption (e.g. a Tensor of
|
||||
indices and values made to a sparse Tensor in user code might write/read out of bounds).
|
||||
|
||||
.. _weights-only-allowlist:
|
||||
|
||||
weights_only allowlist
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As mentioned above, saving a module's ``state_dict`` is a best practice when using ``torch.save``. If loading an old
|
||||
checkpoint that contains an ``nn.Module``, we recommend ``weights_only=False``. When loading a checkpoint that contains
|
||||
tensor subclasses, there will likely be functions/classes that need to be allowlisted, see below for further details.
|
||||
|
|
|
|||
|
|
@ -1304,6 +1304,11 @@ def load(
|
|||
|
||||
Loads an object saved with :func:`torch.save` from a file.
|
||||
|
||||
.. warning::
|
||||
:func:`torch.load()` uses an unpickler under the hood. **Never load data from an untrusted source.**
|
||||
|
||||
See :ref:`weights-only-security` for more details.
|
||||
|
||||
:func:`torch.load` uses Python's unpickling facilities but treats storages,
|
||||
which underlie tensors, specially. They are first deserialized on the
|
||||
CPU and are then moved to the device they were saved from. If this fails
|
||||
|
|
@ -1356,13 +1361,6 @@ def load(
|
|||
:func:`pickle_module.load` and :func:`pickle_module.Unpickler`, e.g.,
|
||||
:attr:`errors=...`.
|
||||
|
||||
.. warning::
|
||||
:func:`torch.load()` unless `weights_only` parameter is set to `True`,
|
||||
uses ``pickle`` module implicitly, which is known to be insecure.
|
||||
It is possible to construct malicious pickle data which will execute arbitrary code
|
||||
during unpickling. Never load data that could have come from an untrusted
|
||||
source in an unsafe mode, or that could have been tampered with. **Only load data you trust**.
|
||||
|
||||
.. note::
|
||||
When you call :func:`torch.load()` on a file which contains GPU tensors, those tensors
|
||||
will be loaded to GPU by default. You can call ``torch.load(.., map_location='cpu')``
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user