Optimize transformer encoder/decoder init suggestion (#146882)

Fixes #72253 Add hint message for users to manually initialize after created. ## Test Result **Before** ![image](https://github.com/user-attachments/assets/1914223f-008e-4ff7-aea1-c54c55679f65) ![image](https://github.com/user-attachments/assets/fd4110c1-26f7-48fe-9582-80581ab72328) **After** ![image](https://github.com/user-attachments/assets/12270ba2-b384-4fe6-b351-4287b272d102) ![image](https://github.com/user-attachments/assets/0194e3a0-700a-40da-a9de-e9854c2d5d2e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146882 Approved by: https://github.com/jbschlosser
2025-12-06 12:20:52 +01:00 · 2025-04-11 02:31:56 +00:00 · 2025-04-11 02:31:56 +00:00 · 4b0cf9fc00
commit 4b0cf9fc00
parent 1e92579126
1 changed files with 8 additions and 0 deletions
--- a/torch/nn/modules/transformer.py
+++ b/torch/nn/modules/transformer.py
@ -314,6 +314,10 @@ class TransformerEncoder(Module):

    Users can build the BERT(https://arxiv.org/abs/1810.04805) model with corresponding parameters.

+    .. warning::
+        All layers in the TransformerEncoder are initialized with the same parameters.
+        It is recommended to manually initialize the layers after creating the TransformerEncoder instance.
+
    Args:
        encoder_layer: an instance of the TransformerEncoderLayer() class (required).
        num_layers: the number of sub-encoder-layers in the encoder (required).
@ -535,6 +539,10 @@ class TransformerDecoder(Module):
        for an in depth discussion of the performant building blocks PyTorch offers for building your own
        transformer layers.

+    .. warning::
+        All layers in the TransformerDecoder are initialized with the same parameters.
+        It is recommended to manually initialize the layers after creating the TransformerDecoder instance.
+
    Args:
        decoder_layer: an instance of the TransformerDecoderLayer() class (required).
        num_layers: the number of sub-decoder-layers in the decoder (required).