pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Alex Baden 5d316c81be [Inductor] Add 0 initialization to Triton masked loads (#127311 ) For a masked `tl.load` operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional `other` parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the `other` parameter is used in a conditional, unexpected behavior can occur. Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if `other` is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit `other` in instances where later conditionals depend on the `tl.load` output. I also removed an exception to `other` behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added `other` to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect `undef` values to cause the example code to fail if the underlying output is not 0-initialized. Finally, I added other to the `make_load` function in `select_algorithm.py`, though I wasn't able to determine if that function was actually being called. Fixes #126535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127311 Approved by: https://github.com/jansel		2024-05-30 04:50:54 +00:00
..
_static	[docs] Update PT2+Profiler docs (#122272 )	2024-03-28 17:52:28 +00:00
_templates	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
community	Fix broken link of scikit-learn (#120972 )	2024-05-16 11:46:34 +00:00
elastic	distributed debug handlers (#126601 )	2024-05-30 02:21:08 +00:00
notes	New Custom Ops Documentation landing page (#127400 )	2024-05-30 01:06:04 +00:00
rpc
scripts	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )	2024-05-27 14:49:57 +00:00
amp.rst	generalize custom_fwd&custom_bwd to be device-agnostic (#126531 )	2024-05-25 06:48:16 +00:00
autograd.rst	Add torch.library.register_autograd (#124071 )	2024-04-18 12:47:59 +00:00
backends.rst	preferred blas library; cublaslt gemm implementation (#122106 )	2024-04-22 15:38:22 +00:00
benchmark_utils.rst	Adding Compare in torch.utils.benchmark documentation (#125009 )	2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst	Add missing words to torch.utils.checkpoint doc (#120196 )	2024-02-20 20:18:42 +00:00
complex_numbers.rst	Document complex optimizer semantic behavior (#121667 )	2024-03-16 00:43:47 +00:00
cond.rst	Fix typo under docs directory (#119657 )	2024-02-15 21:14:34 +00:00
conf.py	Beef up error message for pending assert failure (#126212 )	2024-05-15 18:22:53 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst	Add current_device() to torch.cpu (#110987 )	2023-10-11 05:13:10 +00:00
cuda_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst	[Doc][NVTX] Add documentation for nvtx.range (#121699 )	2024-03-15 20:26:44 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst	Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131 )"	2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst	[DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721 )	2023-06-10 00:15:00 +00:00
debugging_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst	[DCP] Provides default AsyncStager (#124939 )	2024-05-02 19:48:54 +00:00
distributed.elastic.rst	distributed debug handlers (#126601 )	2024-05-30 02:21:08 +00:00
distributed.optim.rst
distributed.pipelining.rst	[pipelining] expose APIs per pytorch rule (#126812 )	2024-05-22 16:21:13 +00:00
distributed.rst	[C10D] Document destroy_process_group usage (#122358 )	2024-05-09 16:51:31 +00:00
distributed.tensor.parallel.rst	[tp] doc fixes (#121431 )	2024-03-08 17:46:44 +00:00
distributions.rst	Add inverse gamma distribution and fix `sign` bug in `PowerTransform`. (#104501 )	2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Remove torch._export.export (#119095 )	2024-02-08 21:22:04 +00:00
export.rst	New Custom Ops Documentation landing page (#127400 )	2024-05-30 01:06:04 +00:00
fft.rst
fsdp.rst	[FSDP][state_dict] Expose optimizer state_dict config (#105949 )	2023-08-21 07:29:49 +00:00
func.api.rst	[functorch] linearize (#94173 )	2023-02-09 15:45:08 +00:00
func.batch_norm.rst	Fix typo under docs directory (#97202 )	2023-03-21 01:24:10 +00:00
func.migrating.rst	[torch.func] Add migration guide from functorch (#91811 )	2023-01-17 22:14:42 +00:00
func.rst	Fix typo under docs directory (#92762 )	2023-01-23 18:07:22 +00:00
func.ux_limitations.rst	[torch.func] Add docs (#91319 )	2022-12-30 02:51:18 +00:00
func.whirlwind_tour.rst	[torch.func] Add docs (#91319 )	2022-12-30 02:51:18 +00:00
future_mod.rst	Add swap_tensors path to nn.Module._apply (#117167 )	2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst	Traceable wrapper subclass support for deferred runtime asserts (#126198 )	2024-05-21 01:21:46 +00:00
fx.rst	[Export] Add runtime assert to non-strict export (#123681 )	2024-04-18 16:13:27 +00:00
hub.rst	Fix typo under docs directory (#92762 )	2023-01-23 18:07:22 +00:00
index.rst	Add module tracker (#125352 )	2024-05-04 18:33:35 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst	Fix typo under docs directory (#97202 )	2023-03-21 01:24:10 +00:00
jit_language_reference.rst	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 )	2023-02-11 18:19:48 +00:00
jit_python_reference.rst
jit_unsupported.rst	Add support for `torch.Generator` type in TorchScript (#110413 )	2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
library.rst	New Custom Ops Documentation landing page (#127400 )	2024-05-30 01:06:04 +00:00
linalg.rst
logging.rst	Change classification to beta for TORCH_LOGS (#118682 )	2024-01-31 21:50:55 +00:00
masked.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
math-quantizer-equation.png
meta.rst	Add documentation for meta device (#119119 )	2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
mobile_optimizer.rst	[Reland] Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#92081 )	2023-01-14 17:06:00 +00:00
model_zoo.rst
module_tracker.rst	Add module tracker (#125352 )	2024-05-04 18:33:35 +00:00
monitor.rst
mps.rst	Conform torch.mps to device module interface (#124676 )	2024-04-23 18:38:48 +00:00
mtia.rst	torch.mtia module for MTIA device backend (#123612 )	2024-04-26 16:17:54 +00:00
multiprocessing.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
name_inference.rst	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 )	2023-08-29 22:13:33 +00:00
named_tensor.rst	fixing named tensor unflatten example (#106921 )	2023-08-22 18:00:10 +00:00
nested.rst	Replace master with main in links and docs/conf.py (#100176 )	2023-05-02 18:20:32 +00:00
nn.attention.bias.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.attention.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.functional.rst	Add RMSNorm module (#121364 )	2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst	Cleanup some duplicated placeholder py:module docs (#123244 )	2024-04-05 03:18:53 +00:00
onnx_dynamo_onnxruntime_backend.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx_dynamo.rst	[ez][doc] Fix sample code in onnx_dynamo.rst (#114770 )	2023-11-29 19:27:52 +00:00
onnx_torchscript_supported_aten_ops.rst	Refactor torch.onnx documentation (#108379 )	2023-09-08 18:23:48 +00:00
onnx_torchscript.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx.rst	fix pytorch version for onnx in doc (#124182 )	2024-04-17 18:05:15 +00:00
optim.rst	Added example regarding weight_decay distinction with per-parameter API (#117436 )	2024-01-22 21:26:02 +00:00
package.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
pipeline.rst	[c10d] Deprecate torch.distributed.pipeline (#121464 )	2024-03-08 19:55:02 +00:00
profiler.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst	update quantization doc: add x86 backend as default backend of server inference (#86794 )	2022-12-02 02:10:25 +00:00
quantization-support.rst	[quant][pt2e] Add `model_is_exported` util function (#119726 )	2024-02-16 19:29:36 +00:00
quantization.rst	Cleanup some duplicated placeholder py:module docs (#123244 )	2024-04-05 03:18:53 +00:00
random.rst
rpc.rst	[BE] RPC is missing RRef docs (#106902 )	2023-08-10 16:26:27 +00:00
signal.rst	Nuttall window (#90103 )	2022-12-16 09:05:53 +00:00
size.rst	Added a docstring for torch.Size.numel. (#124186 )	2024-04-19 09:23:02 +00:00
sparse.rst	Fix typo in sparse.rst (#121826 )	2024-03-19 00:17:19 +00:00
special.rst
storage.rst
tensor_attributes.rst	Include the scalar tensor auto-transfer in the doc (#119967 )	2024-02-15 22:37:39 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	Document the legacy constructor for Tensor (#122625 )	2024-05-29 23:23:19 +00:00
testing.rst	document torch.testing.assert_allclose (#89526 )	2022-12-01 11:22:50 +00:00
threading_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor.rst	Fix aoti doc to avoid cannot bind non-const lvalue reference error (#121672 )	2024-03-12 23:43:40 +00:00
torch.compiler_api.rst	[torch.export] Support is_compiling() flag for non-strict mode (#119602 )	2024-02-29 05:52:51 +00:00
torch.compiler_best_practices_for_backends.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_cudagraph_trees.rst	[docs] add mode="reduce-overhead" into torch.compile to enable cuda g… (#116529 )	2024-01-05 22:54:20 +00:00
torch.compiler_custom_backends.rst	Fix a link in the compiler backend doc (#126079 )	2024-05-21 20:16:04 +00:00
torch.compiler_dynamic_shapes.rst	feat: Add min, max ranges to mark_dynamic API (#119737 )	2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst	Fix links rendering when surrounding code in Dynamo deepdive (#123427 )	2024-04-13 04:55:15 +00:00
torch.compiler_dynamo_overview.rst	Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431 )	2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_faq.rst	[Docs] Fix NumPy + backward example (#126872 )	2024-05-22 21:29:31 +00:00
torch.compiler_fine_grain_apis.rst	[torch.export] Support is_compiling() flag for non-strict mode (#119602 )	2024-02-29 05:52:51 +00:00
torch.compiler_get_started.rst	[Inductor] Add 0 initialization to Triton masked loads (#127311 )	2024-05-30 04:50:54 +00:00
torch.compiler_inductor_profiling.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_ir.rst	[export] torch.export landing page (#108783 )	2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 )	2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_profiling_torch_compile.rst	[docs] Update PT2+Profiler docs (#122272 )	2024-03-28 17:52:28 +00:00
torch.compiler_transformations.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting.rst	Add force_disable_caches to the docs (#126184 )	2024-05-15 07:16:08 +00:00
torch.compiler.rst	Fix links rendering when surrounding code in Dynamo deepdive (#123427 )	2024-04-13 04:55:15 +00:00
torch.overrides.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
torch.rst	torch.mtia module for MTIA device backend (#123612 )	2024-04-26 16:17:54 +00:00
type_info.rst
utils.rst	New swap function (#111747 )	2023-12-08 18:49:35 +00:00
xpu.rst	[2/2] Intel GPU Runtime Upstreaming for Generator (#118613 )	2024-02-28 05:28:11 +00:00