pytorch/docs
Alex Baden 5d316c81be [Inductor] Add 0 initialization to Triton masked loads (#127311)
For a masked `tl.load` operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional `other` parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the `other` parameter is used in a conditional, unexpected behavior can occur.

Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if `other` is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit `other` in instances where later conditionals depend on the `tl.load` output. I also removed an exception to `other` behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added `other` to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect `undef` values to cause the example code to fail if the underlying output is not 0-initialized.  Finally, I added other to the `make_load` function in `select_algorithm.py`, though I wasn't able to determine if that function was actually being called.

Fixes #126535

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127311
Approved by: https://github.com/jansel
2024-05-30 04:50:54 +00:00
..
caffe2 Enable UFMT on a bunch of low traffic Python files outside of main files (#106052) 2023-07-27 01:01:17 +00:00
cpp Update Jinja to 3.1.3 (#124976) 2024-04-26 02:57:55 +00:00
source [Inductor] Add 0 initialization to Triton masked loads (#127311) 2024-05-30 04:50:54 +00:00
.gitignore
libtorch.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
make.bat
Makefile Refactor torch.onnx documentation (#108379) 2023-09-08 18:23:48 +00:00
README.md
requirements.txt update requirements.txt in /docs (#101092) 2023-05-12 03:19:36 +00:00

Please see the Writing documentation section of CONTRIBUTING.md for details on both writing and building the docs.