mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Adding a note for Getting Started with PyTorch on Intel GPUs Pull Request resolved: https://github.com/pytorch/pytorch/pull/127872 Approved by: https://github.com/svekars
340 lines
10 KiB
ReStructuredText
340 lines
10 KiB
ReStructuredText
Pytorch 2.4: Getting Started on Intel GPU
|
|
=========================================
|
|
|
|
The support for Intel GPUs is released alongside PyTorch v2.4.
|
|
|
|
This release only supports build from source for Intel GPUs.
|
|
|
|
Hardware Prerequisites
|
|
----------------------
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
* - Supported Hardware
|
|
- Intel® Data Center GPU Max Series
|
|
* - Supported OS
|
|
- Linux
|
|
|
|
|
|
PyTorch for Intel GPUs is compatible with Intel® Data Center GPU Max Series and only supports OS Linux with release 2.4.
|
|
|
|
Software Prerequisites
|
|
----------------------
|
|
|
|
As a prerequisite, install the driver and required packages by following the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_.
|
|
|
|
Set up Environment
|
|
------------------
|
|
|
|
Before you begin, you need to set up the environment. This can be done by sourcing the ``setvars.sh`` script provided by the ``intel-for-pytorch-gpu-dev`` and ``intel-pti-dev`` packages.
|
|
|
|
.. code-block::
|
|
|
|
source ${ONEAPI_ROOT}/setvars.sh
|
|
|
|
.. note::
|
|
The ``ONEAPI_ROOT`` is the folder you installed your ``intel-for-pytorch-gpu-dev`` and ``intel-pti-dev`` packages. Typically, it is located at ``/opt/intel/oneapi/`` or ``~/intel/oneapi/``.
|
|
|
|
Build from source
|
|
-----------------
|
|
|
|
Now we have all the required packages installed and environment acitvated. Use the following commands to install ``pytorch``, ``torchvision``, ``torchaudio`` by building from source. For more details, refer to official guides in `PyTorch from source <https://github.com/pytorch/pytorch?tab=readme-ov-file#intel-gpu-support>`_, `Vision from source <https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md#development-installation>`_ and `Audio from source <https://pytorch.org/audio/main/build.linux.html>`_.
|
|
|
|
.. code-block::
|
|
|
|
# Get PyTorch Source Code
|
|
git clone --recursive https://github.com/pytorch/pytorch
|
|
cd pytorch
|
|
git checkout main # or checkout the specific release version >= v2.4
|
|
git submodule sync
|
|
git submodule update --init --recursive
|
|
|
|
# Get required packages for compilation
|
|
conda install cmake ninja
|
|
pip install -r requirements.txt
|
|
|
|
# Pytorch for Intel GPUs only support Linux platform for now.
|
|
# Install the required packages for pytorch compilation.
|
|
conda install intel::mkl-static intel::mkl-include
|
|
|
|
# (optional) If using torch.compile with inductor/triton, install the matching version of triton
|
|
# Run from the pytorch directory after cloning
|
|
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
|
|
USE_XPU=1 make triton
|
|
|
|
# If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:
|
|
export _GLIBCXX_USE_CXX11_ABI=1
|
|
|
|
# pytorch build from source
|
|
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
|
|
python setup.py develop
|
|
cd ..
|
|
|
|
# (optional) If using torchvison.
|
|
# Get torchvision Code
|
|
git clone https://github.com/pytorch/vision.git
|
|
cd vision
|
|
git checkout main # or specific version
|
|
python setup.py develop
|
|
cd ..
|
|
|
|
# (optional) If using torchaudio.
|
|
# Get torchaudio Code
|
|
git clone https://github.com/pytorch/audio.git
|
|
cd audio
|
|
pip install -r requirements.txt
|
|
git checkout main # or specific version
|
|
git submodule sync
|
|
git submodule update --init --recursive
|
|
python setup.py develop
|
|
cd ..
|
|
|
|
Check availability for Intel GPU
|
|
--------------------------------
|
|
|
|
.. note::
|
|
Make sure the environment is properly set up by following `Environment Set up <#set-up-environment>`_ before running the code.
|
|
|
|
To check if your Intel GPU is available, you would typically use the following code:
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
torch.xpu.is_available() # torch.xpu is the API for Intel GPU support
|
|
|
|
If the output is ``False``, ensure that you have Intel GPU in your system and correctly follow the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_. Then, check that the PyTorch compilation is correctly finished.
|
|
|
|
Minimum Code Change
|
|
-------------------
|
|
|
|
If you are migrating code from ``cuda``, you would change references from ``cuda`` to ``xpu``. For example:
|
|
|
|
.. code-block::
|
|
|
|
# CUDA CODE
|
|
tensor = torch.tensor([1.0, 2.0]).to("cuda")
|
|
|
|
# CODE for Intel GPU
|
|
tensor = torch.tensor([1.0, 2.0]).to("xpu")
|
|
|
|
The following points outline the support and limitations for PyTorch with Intel GPU:
|
|
|
|
#. Both training and inference workflows are supported.
|
|
#. Both eager mode and ``torch.compile`` is supported.
|
|
#. Data types such as FP32, BF16, FP16, and Automatic Mixed Precision (AMP) are all supported.
|
|
#. Models that depend on third-party components, will not be supported until PyTorch v2.5 or later.
|
|
|
|
Examples
|
|
--------
|
|
|
|
This section contains usage examples for both inference and training workflows.
|
|
|
|
Inference Examples
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
Here is a few inference workflow examples.
|
|
|
|
|
|
Inference with FP32
|
|
"""""""""""""""""""
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
import torchvision.models as models
|
|
|
|
model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
|
|
model.eval()
|
|
data = torch.rand(1, 3, 224, 224)
|
|
|
|
######## code changes #######
|
|
model = model.to("xpu")
|
|
data = data.to("xpu")
|
|
######## code changes #######
|
|
|
|
with torch.no_grad():
|
|
model(data)
|
|
|
|
print("Execution finished")
|
|
|
|
Inference with AMP
|
|
""""""""""""""""""
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
import torchvision.models as models
|
|
|
|
model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
|
|
model.eval()
|
|
data = torch.rand(1, 3, 224, 224)
|
|
|
|
#################### code changes #################
|
|
model = model.to("xpu")
|
|
data = data.to("xpu")
|
|
#################### code changes #################
|
|
|
|
with torch.no_grad():
|
|
d = torch.rand(1, 3, 224, 224)
|
|
############################# code changes #####################
|
|
d = d.to("xpu")
|
|
# set dtype=torch.bfloat16 for BF16
|
|
with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=True):
|
|
############################# code changes #####################
|
|
model(data)
|
|
|
|
print("Execution finished")
|
|
|
|
Inference with ``torch.compile``
|
|
""""""""""""""""""""""""""""""""
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
import torchvision.models as models
|
|
|
|
model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
|
|
model.eval()
|
|
data = torch.rand(1, 3, 224, 224)
|
|
ITERS = 10
|
|
|
|
######## code changes #######
|
|
model = model.to("xpu")
|
|
data = data.to("xpu")
|
|
######## code changes #######
|
|
|
|
model = torch.compile(model)
|
|
for i in range(ITERS):
|
|
with torch.no_grad():
|
|
model(data)
|
|
|
|
print("Execution finished")
|
|
|
|
Training Examples
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
Here is a few training workflow examples.
|
|
|
|
Train with FP32
|
|
"""""""""""""""
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
import torchvision
|
|
|
|
LR = 0.001
|
|
DOWNLOAD = True
|
|
DATA = "datasets/cifar10/"
|
|
|
|
transform = torchvision.transforms.Compose(
|
|
[
|
|
torchvision.transforms.Resize((224, 224)),
|
|
torchvision.transforms.ToTensor(),
|
|
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
|
|
]
|
|
)
|
|
train_dataset = torchvision.datasets.CIFAR10(
|
|
root=DATA,
|
|
train=True,
|
|
transform=transform,
|
|
download=DOWNLOAD,
|
|
)
|
|
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)
|
|
|
|
model = torchvision.models.resnet50()
|
|
criterion = torch.nn.CrossEntropyLoss()
|
|
optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
|
|
model.train()
|
|
######################## code changes #######################
|
|
model = model.to("xpu")
|
|
criterion = criterion.to("xpu")
|
|
######################## code changes #######################
|
|
|
|
for batch_idx, (data, target) in enumerate(train_loader):
|
|
########## code changes ##########
|
|
data = data.to("xpu")
|
|
target = target.to("xpu")
|
|
########## code changes ##########
|
|
optimizer.zero_grad()
|
|
output = model(data)
|
|
loss = criterion(output, target)
|
|
loss.backward()
|
|
optimizer.step()
|
|
print(batch_idx)
|
|
torch.save(
|
|
{
|
|
"model_state_dict": model.state_dict(),
|
|
"optimizer_state_dict": optimizer.state_dict(),
|
|
},
|
|
"checkpoint.pth",
|
|
)
|
|
|
|
print("Execution finished")
|
|
|
|
Train with AMP
|
|
""""""""""""""
|
|
|
|
.. code-block::
|
|
|
|
import torch
|
|
import torchvision
|
|
|
|
LR = 0.001
|
|
DOWNLOAD = True
|
|
DATA = "datasets/cifar10/"
|
|
|
|
use_amp=True
|
|
|
|
transform = torchvision.transforms.Compose(
|
|
[
|
|
torchvision.transforms.Resize((224, 224)),
|
|
torchvision.transforms.ToTensor(),
|
|
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
|
|
]
|
|
)
|
|
train_dataset = torchvision.datasets.CIFAR10(
|
|
root=DATA,
|
|
train=True,
|
|
transform=transform,
|
|
download=DOWNLOAD,
|
|
)
|
|
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)
|
|
|
|
model = torchvision.models.resnet50()
|
|
criterion = torch.nn.CrossEntropyLoss()
|
|
optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
|
|
scaler = torch.amp.GradScaler(enabled=use_amp)
|
|
|
|
model.train()
|
|
######################## code changes #######################
|
|
model = model.to("xpu")
|
|
criterion = criterion.to("xpu")
|
|
######################## code changes #######################
|
|
|
|
for batch_idx, (data, target) in enumerate(train_loader):
|
|
########## code changes ##########
|
|
data = data.to("xpu")
|
|
target = target.to("xpu")
|
|
########## code changes ##########
|
|
# set dtype=torch.bfloat16 for BF16
|
|
with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=use_amp):
|
|
output = model(data)
|
|
loss = criterion(output, target)
|
|
scaler.scale(loss).backward()
|
|
scaler.step(optimizer)
|
|
scaler.update()
|
|
optimizer.zero_grad()
|
|
print(batch_idx)
|
|
|
|
torch.save(
|
|
{
|
|
"model_state_dict": model.state_dict(),
|
|
"optimizer_state_dict": optimizer.state_dict(),
|
|
},
|
|
"checkpoint.pth",
|
|
)
|
|
|
|
print("Execution finished")
|