pytorch/docs/source
Dmitry Rogozhkin d27ecf85db xpu: support sycl with torch.utils.cpp_extension APIs (#132945)
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-16 16:50:59 +00:00
..
_static Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
_templates Add an option for classic search (#142018) 2024-12-06 01:24:52 +00:00
community Add @nikitaved to torch.linalg CODEOWNERS/persons_of_interest (#141803) 2025-02-04 16:11:31 +00:00
elastic
notes Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)" 2025-02-13 18:04:26 +00:00
rpc [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
scripts [BE][Ez]: Apply FURB188: use str remove(pre|suf)fix (#146997) 2025-02-14 03:38:07 +00:00
accelerator.rst [BE][accelerator] formalize API name {current,set}_device_{idx => index} (#140542) 2024-12-12 10:53:48 +00:00
amp.rst Update document for autocast on CPU (#135299) 2024-09-13 09:11:47 +00:00
autograd.rst
backends.rst Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst
cond.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
conf.py Revert "Add flop formula for _scaled_mm (#144872)" 2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
cpp_index.rst
cpu.rst
cuda_environment_variables.rst
cuda._sanitizer.rst
cuda.rst Make torch.cuda.gds APIs public (#147120) 2025-02-14 17:06:50 +00:00
cuda.tunable.rst [ROCm] Fix TunableOp UTs: Rotating Buffer (#143172) 2024-12-14 06:18:11 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst [DCP] Cross-link DCP doc to tutorials (#139776) 2024-11-07 02:19:49 +00:00
distributed.elastic.rst
distributed.fsdp.fully_shard.rst [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst [pipelining] Update tutorials and documentation (#143045) 2024-12-12 18:42:17 +00:00
distributed.rst [C10D] Update docs for wait() (#143305) 2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst Update link in distributed.tensor.parallel.rst (#136103) 2024-09-15 19:36:29 +00:00
distributed.tensor.rst [dtensor] expose the __create_chunk_list__ in the doc (#144100) 2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst [export] Update docs (#142011) 2024-12-05 03:44:46 +00:00
export.programming_model.rst fix formatting in programming model doc (#143587) 2024-12-20 07:09:19 +00:00
export.rst [docs] Minor fixes to export and aoti docs (#144513) 2025-02-13 15:19:35 +00:00
fft.rst
fsdp.rst
func.api.rst Add torch.func.debug_unwrap (#146528) 2025-02-06 18:48:09 +00:00
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst
futures.rst
fx.experimental.rst Add truediv support in export serializer (#136364) 2024-12-05 17:33:33 +00:00
fx.rst Consolidate SymDispatchMode into ProxyTensorMode (#132674) 2024-08-08 12:02:54 +00:00
hub.rst
index.rst Add Torchao docs link to Pytorch libraries (#145412) 2025-01-24 17:11:20 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_language_reference.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
jit.rst
library.rst [Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588) 2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst
masked.rst Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262) 2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst
miscellaneous_environment_variables.rst Add environment variable to force no weights_only load (#138225) 2024-10-21 23:26:15 +00:00
mobile_optimizer.rst Add ExecuTorch warning to mobile_optimizer (#134697) 2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst
monitor.rst
mps_environment_variables.rst [MPS] Add mps profiler env vars to docs (#129552) 2024-07-04 06:44:48 +00:00
mps.rst [MPS] Expose MPSProfiler::start/stopCapture to Python (#144561) 2025-01-11 02:05:36 +00:00
mtia.memory.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
mtia.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
nn.attention.bias.rst
nn.attention.experimental.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.attention.flex_attention.rst FlexAttention support for NJT (#136792) 2024-10-28 20:01:27 +00:00
nn.attention.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.functional.rst
nn.init.rst
nn.rst Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ (#139662) 2024-11-07 23:13:23 +00:00
onnx_dynamo_memory_usage.rst Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139) 2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_dynamo.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
onnx_torchscript.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx.rst [ONNX] Improves documentation of ONNX exporter (#135372) 2024-09-09 15:09:01 +00:00
optim.rst Ensure SWA boundary conditions w.r.t. definition (#133773) 2024-10-31 18:24:08 +00:00
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst Add support for prototype affine quantization in pt2e flow (#141421) 2024-12-24 04:22:18 +00:00
quantization.rst [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505) 2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst
sparse.rst SparseCsrCUDA: cuDSS backend for linalg.solve (#129856) 2024-08-22 07:57:30 +00:00
special.rst
storage.rst Doc: Rewrite the storage.rst file to emphasize untyped storages (#140145) 2024-11-13 17:40:16 +00:00
tensor_attributes.rst [Docs] Remove duplicate declaration of double_tensor (#140927) 2024-11-18 21:22:30 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst
testing.rst
threading_environment_variables.rst
torch_cuda_memory.rst
torch_environment_variables.rst
torch_nccl_environment_variables.rst [c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920) 2024-08-09 21:08:20 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor_minifier.rst Aoti minifier flatten (#141156) 2024-12-06 07:12:45 +00:00
torch.compiler_aot_inductor.rst [docs] Minor fixes to export and aoti docs (#144513) 2025-02-13 15:19:35 +00:00
torch.compiler_api.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)" 2025-02-13 18:04:26 +00:00
torch.compiler_custom_backends.rst [pt2, docs] Add new PT2 troubleshooting doc (#138620) 2024-11-09 01:17:39 +00:00
torch.compiler_dynamic_shapes.rst
torch.compiler_dynamo_deepdive.rst fix typo in torch.compiler_dynamo_deepdive.rst (#140871) 2024-11-19 14:42:36 +00:00
torch.compiler_dynamo_overview.rst
torch.compiler_fake_tensor.rst [doc] improve code in fake tensor doc (#140329) 2024-11-13 05:14:56 +00:00
torch.compiler_faq.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst [Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458) 2024-10-14 20:20:29 +00:00
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst [EZ] Fix spelling typo (#136157) 2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst
torch.compiler_troubleshooting_old.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler.config.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.compiler.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.overrides.rst
torch.rst Transform unbacked int expressions into a fresh unbacked int. (#141917) 2024-12-05 16:53:44 +00:00
type_info.rst
utils.rst
xpu.rst Add get_stream_from_external API for XPU backend (#141123) 2024-12-31 11:15:52 +00:00