pytorch/docs/source
Alex Baden 5d316c81be [Inductor] Add 0 initialization to Triton masked loads (#127311)
For a masked `tl.load` operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional `other` parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the `other` parameter is used in a conditional, unexpected behavior can occur.

Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if `other` is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit `other` in instances where later conditionals depend on the `tl.load` output. I also removed an exception to `other` behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added `other` to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect `undef` values to cause the example code to fail if the underlying output is not 0-initialized.  Finally, I added other to the `make_load` function in `select_algorithm.py`, though I wasn't able to determine if that function was actually being called.

Fixes #126535

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127311
Approved by: https://github.com/jansel
2024-05-30 04:50:54 +00:00
..
_static [docs] Update PT2+Profiler docs (#122272) 2024-03-28 17:52:28 +00:00
_templates Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689) 2024-01-24 22:28:04 +00:00
community Fix broken link of scikit-learn (#120972) 2024-05-16 11:46:34 +00:00
elastic distributed debug handlers (#126601) 2024-05-30 02:21:08 +00:00
notes New Custom Ops Documentation landing page (#127400) 2024-05-30 01:06:04 +00:00
rpc
scripts [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126) 2024-05-27 14:49:57 +00:00
amp.rst generalize custom_fwd&custom_bwd to be device-agnostic (#126531) 2024-05-25 06:48:16 +00:00
autograd.rst Add torch.library.register_autograd (#124071) 2024-04-18 12:47:59 +00:00
backends.rst preferred blas library; cublaslt gemm implementation (#122106) 2024-04-22 15:38:22 +00:00
benchmark_utils.rst Adding Compare in torch.utils.benchmark documentation (#125009) 2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst Add missing words to torch.utils.checkpoint doc (#120196) 2024-02-20 20:18:42 +00:00
complex_numbers.rst Document complex optimizer semantic behavior (#121667) 2024-03-16 00:43:47 +00:00
cond.rst Fix typo under docs directory (#119657) 2024-02-15 21:14:34 +00:00
conf.py Beef up error message for pending assert failure (#126212) 2024-05-15 18:22:53 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst Add current_device() to torch.cpu (#110987) 2023-10-11 05:13:10 +00:00
cuda_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst [Doc][NVTX] Add documentation for nvtx.range (#121699) 2024-03-15 20:26:44 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131)" 2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst [DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721) 2023-06-10 00:15:00 +00:00
debugging_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377) 2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst [DCP] Provides default AsyncStager (#124939) 2024-05-02 19:48:54 +00:00
distributed.elastic.rst distributed debug handlers (#126601) 2024-05-30 02:21:08 +00:00
distributed.optim.rst
distributed.pipelining.rst [pipelining] expose APIs per pytorch rule (#126812) 2024-05-22 16:21:13 +00:00
distributed.rst [C10D] Document destroy_process_group usage (#122358) 2024-05-09 16:51:31 +00:00
distributed.tensor.parallel.rst [tp] doc fixes (#121431) 2024-03-08 17:46:44 +00:00
distributions.rst Add inverse gamma distribution and fix sign bug in PowerTransform. (#104501) 2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst [export] Remove torch._export.export (#119095) 2024-02-08 21:22:04 +00:00
export.rst New Custom Ops Documentation landing page (#127400) 2024-05-30 01:06:04 +00:00
fft.rst
fsdp.rst [FSDP][state_dict] Expose optimizer state_dict config (#105949) 2023-08-21 07:29:49 +00:00
func.api.rst [functorch] linearize (#94173) 2023-02-09 15:45:08 +00:00
func.batch_norm.rst Fix typo under docs directory (#97202) 2023-03-21 01:24:10 +00:00
func.migrating.rst [torch.func] Add migration guide from functorch (#91811) 2023-01-17 22:14:42 +00:00
func.rst Fix typo under docs directory (#92762) 2023-01-23 18:07:22 +00:00
func.ux_limitations.rst [torch.func] Add docs (#91319) 2022-12-30 02:51:18 +00:00
func.whirlwind_tour.rst [torch.func] Add docs (#91319) 2022-12-30 02:51:18 +00:00
future_mod.rst Add swap_tensors path to nn.Module._apply (#117167) 2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst Traceable wrapper subclass support for deferred runtime asserts (#126198) 2024-05-21 01:21:46 +00:00
fx.rst [Export] Add runtime assert to non-strict export (#123681) 2024-04-18 16:13:27 +00:00
hub.rst Fix typo under docs directory (#92762) 2023-01-23 18:07:22 +00:00
index.rst Add module tracker (#125352) 2024-05-04 18:33:35 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst Fix typo under docs directory (#97202) 2023-03-21 01:24:10 +00:00
jit_language_reference.rst [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587) 2023-02-11 18:19:48 +00:00
jit_python_reference.rst
jit_unsupported.rst Add support for torch.Generator type in TorchScript (#110413) 2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
library.rst New Custom Ops Documentation landing page (#127400) 2024-05-30 01:06:04 +00:00
linalg.rst
logging.rst Change classification to beta for TORCH_LOGS (#118682) 2024-01-31 21:50:55 +00:00
masked.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
math-quantizer-equation.png
meta.rst Add documentation for meta device (#119119) 2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
mobile_optimizer.rst [Reland] Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#92081) 2023-01-14 17:06:00 +00:00
model_zoo.rst
module_tracker.rst Add module tracker (#125352) 2024-05-04 18:33:35 +00:00
monitor.rst
mps.rst Conform torch.mps to device module interface (#124676) 2024-04-23 18:38:48 +00:00
mtia.rst torch.mtia module for MTIA device backend (#123612) 2024-04-26 16:17:54 +00:00
multiprocessing.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
name_inference.rst [docs] Properly link register_post_accumulate_grad_hook docs (#108157) 2023-08-29 22:13:33 +00:00
named_tensor.rst fixing named tensor unflatten example (#106921) 2023-08-22 18:00:10 +00:00
nested.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
nn.attention.bias.rst Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689) 2024-01-24 22:28:04 +00:00
nn.attention.rst Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689) 2024-01-24 22:28:04 +00:00
nn.functional.rst Add RMSNorm module (#121364) 2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst Cleanup some duplicated placeholder py:module docs (#123244) 2024-04-05 03:18:53 +00:00
onnx_dynamo_onnxruntime_backend.rst Follow-up #108379 (#108905) 2023-09-09 01:38:36 +00:00
onnx_dynamo.rst [ez][doc] Fix sample code in onnx_dynamo.rst (#114770) 2023-11-29 19:27:52 +00:00
onnx_torchscript_supported_aten_ops.rst Refactor torch.onnx documentation (#108379) 2023-09-08 18:23:48 +00:00
onnx_torchscript.rst Follow-up #108379 (#108905) 2023-09-09 01:38:36 +00:00
onnx.rst fix pytorch version for onnx in doc (#124182) 2024-04-17 18:05:15 +00:00
optim.rst Added example regarding weight_decay distinction with per-parameter API (#117436) 2024-01-22 21:26:02 +00:00
package.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
pipeline.rst [c10d] Deprecate torch.distributed.pipeline (#121464) 2024-03-08 19:55:02 +00:00
profiler.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst update quantization doc: add x86 backend as default backend of server inference (#86794) 2022-12-02 02:10:25 +00:00
quantization-support.rst [quant][pt2e] Add model_is_exported util function (#119726) 2024-02-16 19:29:36 +00:00
quantization.rst Cleanup some duplicated placeholder py:module docs (#123244) 2024-04-05 03:18:53 +00:00
random.rst
rpc.rst [BE] RPC is missing RRef docs (#106902) 2023-08-10 16:26:27 +00:00
signal.rst Nuttall window (#90103) 2022-12-16 09:05:53 +00:00
size.rst Added a docstring for torch.Size.numel. (#124186) 2024-04-19 09:23:02 +00:00
sparse.rst Fix typo in sparse.rst (#121826) 2024-03-19 00:17:19 +00:00
special.rst
storage.rst
tensor_attributes.rst Include the scalar tensor auto-transfer in the doc (#119967) 2024-02-15 22:37:39 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst Document the legacy constructor for Tensor (#122625) 2024-05-29 23:23:19 +00:00
testing.rst document torch.testing.assert_allclose (#89526) 2022-12-01 11:22:50 +00:00
threading_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst Fix typo under docs directory (#110359) 2023-10-03 16:36:05 +00:00
torch_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor.rst Fix aoti doc to avoid cannot bind non-const lvalue reference error (#121672) 2024-03-12 23:43:40 +00:00
torch.compiler_api.rst [torch.export] Support is_compiling() flag for non-strict mode (#119602) 2024-02-29 05:52:51 +00:00
torch.compiler_best_practices_for_backends.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_cudagraph_trees.rst [docs] add mode="reduce-overhead" into torch.compile to enable cuda g… (#116529) 2024-01-05 22:54:20 +00:00
torch.compiler_custom_backends.rst Fix a link in the compiler backend doc (#126079) 2024-05-21 20:16:04 +00:00
torch.compiler_dynamic_shapes.rst feat: Add min, max ranges to mark_dynamic API (#119737) 2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst Fix links rendering when surrounding code in Dynamo deepdive (#123427) 2024-04-13 04:55:15 +00:00
torch.compiler_dynamo_overview.rst Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431) 2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_faq.rst [Docs] Fix NumPy + backward example (#126872) 2024-05-22 21:29:31 +00:00
torch.compiler_fine_grain_apis.rst [torch.export] Support is_compiling() flag for non-strict mode (#119602) 2024-02-29 05:52:51 +00:00
torch.compiler_get_started.rst [Inductor] Add 0 initialization to Triton masked loads (#127311) 2024-05-30 04:50:54 +00:00
torch.compiler_inductor_profiling.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_ir.rst [export] torch.export landing page (#108783) 2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964) 2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_profiling_torch_compile.rst [docs] Update PT2+Profiler docs (#122272) 2024-03-28 17:52:28 +00:00
torch.compiler_transformations.rst Fix typo under docs directory (#110359) 2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting.rst Add force_disable_caches to the docs (#126184) 2024-05-15 07:16:08 +00:00
torch.compiler.rst Fix links rendering when surrounding code in Dynamo deepdive (#123427) 2024-04-13 04:55:15 +00:00
torch.overrides.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
torch.rst torch.mtia module for MTIA device backend (#123612) 2024-04-26 16:17:54 +00:00
type_info.rst
utils.rst New swap function (#111747) 2023-12-08 18:49:35 +00:00
xpu.rst [2/2] Intel GPU Runtime Upstreaming for Generator (#118613) 2024-02-28 05:28:11 +00:00