Commit Graph

28187 Commits

Author SHA1 Message Date
Niklas Vangerow
1424c4f739 Migrate slice_test to use PjRt.
PiperOrigin-RevId: 826158235
2025-10-30 13:45:44 -07:00
Karlo Basioli
5973848600 [XLA][codegen] Emit shlo reshape from the fusion emitter and lower it to triton for the triton backend.
PiperOrigin-RevId: 826147865
2025-10-30 13:32:56 -07:00
A. Unique TensorFlower
7e7b1a3015 Allow empty dimension list in SymbolicMap::ReplaceDimsAndSymbols
I originally assumed the caller was always providing a full list of replacements but IndexingMap have some uses where the dim_replacement list is empty, resulting in a CHECK-fail.

So, I'm allowing the user to provide either dim or symbol empty lists to ReplaceDimsAndSymbols. In that case, the dims/symbols won't be replaced.

PiperOrigin-RevId: 826138814
2025-10-30 13:18:28 -07:00
Niklas Vangerow
175774337e Migrate params_test to use PjRt.
PiperOrigin-RevId: 826137636
2025-10-30 13:07:24 -07:00
Zixuan Jiang
146c4f56b7 Clear frontend attributes for get-tuple-elements of GlobalToLocal and LocalToGlobal custom-calls.
The GlobalToLocal and LocalToGlobal custom-calls are for Shardy round trip. These get-tuple-elements will be removed when we import the Shardy dialect and thus they do not need to hold frontend attributes.

This can reduce the size of the generated HLO module text.

PiperOrigin-RevId: 826134489
2025-10-30 12:43:45 -07:00
William S. Moses
a94890b1f9 Improve CUDNN error messages
PiperOrigin-RevId: 826124080
2025-10-30 12:34:16 -07:00
Oleg Shyshkov
9eeebc9be5 [XLA:GPU] Use a single intra-host ragged-all-to-all in the decomposition.
Instead of 2 ra2a + concat, we can double the output buffer and adjust output offsets. This way we can save on latency by having only one multi-GPU synchronization.

PiperOrigin-RevId: 826122665
2025-10-30 12:24:20 -07:00
Eugene Zhulenev
9b51864c7b [xla:ffi] Add example of async custom call in XLA:GPU
PiperOrigin-RevId: 826121283
2025-10-30 12:11:20 -07:00
Niklas Vangerow
061041963e Migrate map_test to use PjRt.
PiperOrigin-RevId: 826107887
2025-10-30 11:52:36 -07:00
A. Unique TensorFlower
dd3a14ace4 [Autotuner] Add sharding support using KeyValueStore Interface.
- The logic is ported from gemm_fusion_autotuner. I have changed the key of the Key Value store to be just module-fingerprint, earlier it was module-fingerprint + autotunable-fusion-set-from-the-module-fingerprint. The module fingerprint should already represent the fusion-sets contained in it.
- We can improve or just remove this functionality when we design storage for offline autotuning.

PiperOrigin-RevId: 826103885
2025-10-30 11:43:43 -07:00
Karlo Basioli
4ffcba9004 [XLA][codegen] Emit stablehlo reduce op from the fusion emitter and lower it to triton for the triton backend.
PiperOrigin-RevId: 826102479
2025-10-30 11:30:55 -07:00
Niklas Vangerow
0c87bef802 Migrate reshape_test to use PjRt.
PiperOrigin-RevId: 826087067
2025-10-30 11:21:47 -07:00
Will Froom
6dd75c4e8b [XTile] Modify Stable HLO check on iota to restrict it to the 1D case.
PiperOrigin-RevId: 826085272
2025-10-30 11:01:37 -07:00
A. Unique TensorFlower
f2b36d1780 Integrate LLVM at llvm/llvm-project@4c46ae3948
Updates LLVM usage to match
[4c46ae394841](https://github.com/llvm/llvm-project/commit/4c46ae394841)

PiperOrigin-RevId: 826082725
2025-10-30 10:40:50 -07:00
Christian Sigg
3943b53326 Increase the maximum HLO op chain length for profiling from 8192 to 16384.
This prevents max chains of trivial ops (e.g. add.fp32) to run faster than copying the data, which results in 'too fast to measure' error.

PiperOrigin-RevId: 826079017
2025-10-30 10:34:18 -07:00
Yun Peng
71e640f242 Update Bazel version to 7.7.0.
This change updates the Bazel version used in TensorFlow, JAX, and XLA projects from 7.4.1 to 7.7.0 in `.bazelversion` files and build scripts.

PiperOrigin-RevId: 826075658
2025-10-30 10:27:38 -07:00
Kanish Anand
1bef3e80b5 Reuse tuple elements field from existing HloSharding
PiperOrigin-RevId: 826058519
2025-10-30 10:16:11 -07:00
dependabot[bot]
d638f84b90 PR #33278: Bump keras from 3.11.3 to 3.12.0 in /xla/backends/cpu/benchmarks/e2e/gemma2/keras
Imported from GitHub PR https://github.com/openxla/xla/pull/33278

Bumps [keras](https://github.com/keras-team/keras) from 3.11.3 to 3.12.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/keras-team/keras/releases">keras's releases</a>.</em></p>
<blockquote>
<h2>Keras 3.12.0</h2>
<h2>Highlights</h2>
<h3>Keras has a new model distillation API!</h3>
<p>You now have access to an easy-to-use API for distilling large models into small models while minimizing performance drop on a reference dataset -- compatible with all existing Keras models. You can specify a range of different distillation losses, or create your own losses. The API supports multiple concurrent distillation losses at the same time.</p>
<p>Example:</p>
<pre lang="python"><code># Load a model to distill
teacher = ...
# This is the model we want to distill it into
student = ...
<h1>Configure the process</h1>
<p>distiller = Distiller(
teacher=teacher,
student=student,
distillation_losses=LogitsDistillation(temperature=3.0),
)
distiller.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)</p>
<h1>Train the distilled model</h1>
<p>distiller.fit(x_train, y_train, epochs=10)
</code></pre></p>
<h3>Keras supports GPTQ quantization!</h3>
<p>GPTQ is now built into the Keras API. GPTQ is a post-training, weights-only quantization method that compresses a model to int4 layer by layer. For each layer, it uses a second-order method to update weights while minimizing the error on a calibration dataset.</p>
<p>Learn how to use it <a href="https://keras.io/guides/gptq_quantization_in_keras/">in this guide</a>.</p>
<p>Example:</p>
<pre lang="python"><code>model = keras_hub.models.Gemma3CausalLM.from_preset(&quot;gemma3_1b&quot;)
gptq_config = keras.quantizers.GPTQConfig(
    dataset=calibration_dataset,
    tokenizer=model.preprocessor.tokenizer,
    weight_bits=4,
    group_size=128,
    num_samples=256,
    sequence_length=256,
    hessian_damping=0.01,
    symmetric=False,
&lt;/tr&gt;&lt;/table&gt;
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="adbfd13426"><code>adbfd13</code></a> Add warning to <code>set_backend</code> and more detailed example. (<a href="https://redirect.github.com/keras-team/keras/issues/21787">#21787</a>)</li>
<li><a href="70598b7903"><code>70598b7</code></a> Fix typo in Distiller docstring</li>
<li><a href="eecd34f406"><code>eecd34f</code></a> Fix: <code>keras.ops.quantile</code> works with tf graph execution (<a href="https://redirect.github.com/keras-team/keras/issues/21782">#21782</a>)</li>
<li><a href="c2bc6cfcc7"><code>c2bc6cf</code></a> Suport keras.op.view() to view the same data bitwise at a new dtype  (<a href="https://redirect.github.com/keras-team/keras/issues/21763">#21763</a>)</li>
<li><a href="10b51ce5a5"><code>10b51ce</code></a> Make confusion metrics compilable. (<a href="https://redirect.github.com/keras-team/keras/issues/21775">#21775</a>)</li>
<li><a href="18f79d69c9"><code>18f79d6</code></a> Fix negative index handling in MultiHeadAttention attention_axes (<a href="https://redirect.github.com/keras-team/keras/issues/21721">#21721</a>)</li>
<li><a href="18e0364cbc"><code>18e0364</code></a> Support for extracting volume patches (<a href="https://redirect.github.com/keras-team/keras/issues/21759">#21759</a>)</li>
<li><a href="dc5e42cca4"><code>dc5e42c</code></a> fix sas metrics in jax <code>fit</code> (<a href="https://redirect.github.com/keras-team/keras/issues/21765">#21765</a>)</li>
<li><a href="1ba3b8f896"><code>1ba3b8f</code></a> Fix discretization discrepancy (<a href="https://redirect.github.com/keras-team/keras/issues/21769">#21769</a>)</li>
<li><a href="53987a768d"><code>53987a7</code></a> Document that <code>set_backend</code> requires re-importing keras. (<a href="https://redirect.github.com/keras-team/keras/issues/21764">#21764</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/keras-team/keras/compare/v3.11.3...v3.12.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=keras&package-manager=pip&previous-version=3.11.3&new-version=3.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/openxla/xla/network/alerts).

</details>
Copybara import of the project:

--
b37d94a32428d62ed3e73765f4e7b61bc6ed8549 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump keras in /xla/backends/cpu/benchmarks/e2e/gemma2/keras

Bumps [keras](https://github.com/keras-team/keras) from 3.11.3 to 3.12.0.
- [Release notes](https://github.com/keras-team/keras/releases)
- [Commits](https://github.com/keras-team/keras/compare/v3.11.3...v3.12.0)

---
updated-dependencies:
- dependency-name: keras
  dependency-version: 3.12.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #33278

PiperOrigin-RevId: 826053656
2025-10-30 10:07:36 -07:00
A. Unique TensorFlower
a28d4bf9f8 Add helper functions for creating and inspecting symbolic dimensions and symbols
Symbolic dimensions and symbols are both implemented as SymbolicExpr variables, with symbols being offset by the number of dimensions. This implementation detail was previously exposed to users of SymbolicExprContext, who had to manually calculate variable IDs. I ended having a hidden bug in the implementation of IndexingMap, so I included these free functions to make the translation less bug-prone.

The SymbolicMap tests have been updated to use these new helper functions

PiperOrigin-RevId: 826053249
2025-10-30 09:54:07 -07:00
Marcin Radomski
a0921d9997 [XLA:GPU] CustomCallThunk: enable use of lambdas with captures
Add CustomCallThunk::OwnedHandlerBundle, a bag of `unique_ptr<ffi::Ffi>` that
enable using lambdas with captures in CustomCallThunk. Lambda captures must
outlive the created thunk.

The functionality is similar to what is possible with "old-style" callbacks,
but doesn't depend on them, and adds support for other handlers available via
XLA_FFI_Handler_Bundle.

PiperOrigin-RevId: 826043689
2025-10-30 09:32:58 -07:00
Karlo Basioli
4461afa7ef [XLA:CPU] Compare host cpu features when loading AOT result to the compilation machine features
PiperOrigin-RevId: 826043058
2025-10-30 09:23:42 -07:00
A. Unique TensorFlower
fd85062199 Add hashing support for SymbolicMap
This change implements AbslHashValue and llvm::hash_value for xla::gpu::SymbolicMap.

This is a prerequisite for correctly implementing AbslHashValue for xla::IndexingMap after its internal migration to use SymbolicMap. Specifically, it needs be used in IndexingMap::AbslHashValue.

PiperOrigin-RevId: 826038011
2025-10-30 09:11:19 -07:00
A. Unique TensorFlower
bec8916f32 [XLA:Collective] Remove unnecessary const for a function argument
PiperOrigin-RevId: 826036516
2025-10-30 09:04:33 -07:00
Henning Becker
772ed8bbc7 Add serialization for ffi::Attribute
`ffi::Attribute` and related types are members of the `CustomCallThunk`. Therefore we need to be able to serialize these types to proto message if we wanna be able to serialize instances of CustomCallThunk.

So this change is adding proto message representation of the types and adds functions `ToProto` and `FromProto` to each of them. Most of the types are currently defined as type aliases of some `std::variant` instantiation. This changes replaces the aliases by classes which inherit from the std::variant type. These new types then get the proto serialization functions.

PiperOrigin-RevId: 826035931
2025-10-30 08:57:22 -07:00
Oleg Shyshkov
04fb26f2d1 [XLA:GPU] Add multi-host ragged-all-to-all decomposer pattern for combine ra2a.
ragged-all-to-all can be used in two pattern:

1. If input size if smaller than the output, we assume that it's a dispatch phase where input is dense and we're distributing data.
2. If output size if smaller, we assume that input is padded and we're gathering data, distributed with the first pattern.

Original decomposer was optimized for dispatch case, because it did cross-host all-gather of dense input It's not optimal for the combine phase, because it results it transferring a lot of padding cross host.

A more optimal set of operations is to do local partial ragged-all-to-all, exchange partial results, and apply them locally.

PiperOrigin-RevId: 826035358
2025-10-30 08:44:13 -07:00
Karlo Basioli
4df0c4afcd [XLA][codegen] Emit ReshapeToScalar as a tensor.extract op, and lower to triton specific impl.
PiperOrigin-RevId: 826027204
2025-10-30 08:31:56 -07:00
A. Unique TensorFlower
345a251037 Add helper functions for creating constant SymbolicExpr
Similarly to GetAffineConstantExpr, I'm introducing GetSymbolicConstantExpr/s to create constants of SymbolicExpr from a list of values.

PiperOrigin-RevId: 826020777
2025-10-30 08:25:55 -07:00
Oleg Shyshkov
fd71e8be05 [XLA:GPU] Introduce CollectiveOpsE2ETestBase for common functionality.
We need a proper base class for common functionality. `CollectiveOpsTestE2E` is not a good base class, because it also holds a lot of fp8-specific helpers that are only used in a few tests.

PiperOrigin-RevId: 826017075
2025-10-30 08:12:19 -07:00
Marcin Radomski
689bf5ef28 [XLA:GPU] Add Thunk::TransformAllNestedThunks
A function similar to ForAllThunksMutable, but capable of wrapping or replacing nested thunks with new ones. In case the thunk has some specific requirements (e.g. assumes the directly nested thunk is a `SequentialThunk`), the implementation needs to ensure the assumptions still hold. The transform function must be infallible, but it may be a no-op and return the original `unique_ptr<Thunk>`.

The use for this is in buffer checksumming: to cover all thunks, we need to recursively insert pre-/post-execution checksum thunks around execution of nested thunks. Unfortunately, ForAllThunksMutable doesn't allow mutating the thunk's *container*, which would create the need to switch() over thunk kind, special-case any thunk that may contain nested thunks, and expose the internals to the caller to reimplement the recursion.

PiperOrigin-RevId: 826014413
2025-10-30 08:00:41 -07:00
Alex
2f1ef8b1a0 PR #33188: [ROCm] Fix hermetic tests when executing on rbe worker
Imported from GitHub PR https://github.com/openxla/xla/pull/33188

📝 Summary of Changes
Fix tests when executing on rbe

🎯 Justification
Fix hermetic dependencies so right libs are loaded.
Libs are put as a data dependency as many of the rocm libs may have their own
RPATH=$ORIGIN/..., they will try to load the libs from the place where they are.
This leads to a situation that one lib located in _solid_data dir will try to find its
dependencies in the same dir. However dependency might be located in the data
directory. Then either test can't load the lib or tries to load one from the system libs.

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
Build fix

🧪 Unit Tests:
Build fix

🧪 Execution Tests:
Build fix

Copybara import of the project:

--
43913bfbacc042c7a81e692cfa28e0a0733c3989 by Alexandros Theodoridis <atheodor@amd.com>:

Add minimal hip runtime deps to kernel_headers

--
ffb62f536608b00445fda4fce0a3323c5bf6fe9d by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build invalid local rpath

Merging this change closes #33188

PiperOrigin-RevId: 826014075
2025-10-30 07:50:50 -07:00
A. Unique TensorFlower
ca45a1e4bb [XLA:GPU] Allow to map slice of memory with multicast object.
PiperOrigin-RevId: 826013717
2025-10-30 07:37:58 -07:00
Ilya Tikhonovskiy
398fefb520 [XLA:GPU] Enable fusion of broadcast and reshape into Triton scaled-dot.
This change allows broadcast and reshape operations on the scale operands of `scaled-dot` to be fused into the Triton kernel. It generalizes the operand fusion logic to handle all four operands of `scaled-dot` and adds support for `BroadcastOp` and `ExpandDimsOp` in the Triton MLIR conversion. A new test case is added to verify this fusion.

PiperOrigin-RevId: 826012089
2025-10-30 07:11:55 -07:00
Adrian Kuegel
3bd63fafc5 Restore setting the default value for xla_gpu_enable_dynamic_slice_fusion flag.
The line was accidentally removed, this change adds it back with the previous
default (false).

PiperOrigin-RevId: 826008957
2025-10-30 06:58:33 -07:00
Thomas Joerg
fbaeea227b [XLA:GPU] Test that DotDecomposer canonicalizes batch dims.
Existing tests do not cover this.

PiperOrigin-RevId: 826006833
2025-10-30 06:45:23 -07:00
Nikita Putikhin
3449303622 Handle PyCUtensorMapObject in extractTmaDesc in the launcher
Reenables failing tests

PiperOrigin-RevId: 825983235
2025-10-30 05:15:23 -07:00
A. Unique TensorFlower
8a9bec96a6 Automated Code Change
PiperOrigin-RevId: 825961742
2025-10-30 03:56:21 -07:00
Alexander Shaposhnikov
d29d0f8635 Minor cleanup in YnnDimensions.
PiperOrigin-RevId: 825955545
2025-10-30 03:37:49 -07:00
Adrian Kuegel
1fa646265a [XLA:GPU] Avoid a segfault in StreamAttributeAnnotator
Currently it is assumed that GetTupleElement is never the root of a
computation. That assumption is not necessarily true, e.g. during autotuning of
Cublas Gemm calls we can have a GetTupleElement op as root.

PiperOrigin-RevId: 825932301
2025-10-30 02:36:46 -07:00
Adrian Kuegel
ce8015c614 [XLA:CPU] Remove obsolete IndexedArrayAnalysisPrinterPass.
It is not being used anymore.

PiperOrigin-RevId: 825927601
2025-10-30 02:04:18 -07:00
Bhatu
a36834c399 Update rules_ml_toolchain to version with nvcc wrapper fixes .
PiperOrigin-RevId: 825832143
2025-10-29 20:42:44 -07:00
Zixuan Jiang
ba10feaa24 Add an overload for SpmdPartitioner::SetPartitionedHlo to avoid unnecessary lambda functions.
PiperOrigin-RevId: 825819367
2025-10-29 20:10:18 -07:00
A. Unique TensorFlower
512f1e48cb Use reuse semantics instead of copy semantics when calling DisassembleIntoSingleDeviceArrays, since disassemble always aliases.
Use copy semantics instead of reuse semantics when calling CopyToHostBuffer, since copy to host always copies.

PiperOrigin-RevId: 825815935
2025-10-29 19:53:06 -07:00
Subhankar Shah
28e9e5ea27 [XLA:MSA] Allow block prefetching for custom call prefetches that have aliased uses.
* Extend alternate memory chunk reservations for aliased uses.
* Add pinned allocations in alternate memory for aliased uses.
* Mark all aliased allocations as colocated.

Pin all values aliased with the prefetched source value to default memory.

PiperOrigin-RevId: 825801181
2025-10-29 19:02:42 -07:00
Eugene Zhulenev
e99aad85fe [xla:codegen] Cleanup MlirKernelSource APIs
PiperOrigin-RevId: 825795569
2025-10-29 18:41:47 -07:00
Will Froom
860f543d1c [XLA:CPU][XTile] Create lowering for Iota.
PiperOrigin-RevId: 825789498
2025-10-29 18:30:39 -07:00
A. Unique TensorFlower
c7055c2e5b Reverts 0b0ff7c8ac
PiperOrigin-RevId: 825779916
2025-10-29 18:10:47 -07:00
Niklas Vangerow
fe2a783077 Migrate broadcast_simple_test to use PjRt.
PiperOrigin-RevId: 825775803
2025-10-29 17:58:13 -07:00
Felix Wang
152b2338d9 Refactor xla codebase to avoid the dynamic_cast, use ClassOf or DynCast instead.
PiperOrigin-RevId: 825773229
2025-10-29 17:49:47 -07:00
A. Unique TensorFlower
9511b51e61 Remove dependency on private absl/base:endian.
PiperOrigin-RevId: 825772784
2025-10-29 17:37:56 -07:00
Eugene Zhulenev
d4b7f15aee [xla:ffi] NFC: Use AttrTag<T> for tagging regular arguments
PiperOrigin-RevId: 825769996
2025-10-29 17:19:10 -07:00
Hyeontaek Lim
b1d5462115 [IFRT] Remove -DIFRT_REQUIRE_USER_CONTEXT that is no longer used for detecting missing user contexts.
IFRT users such as JAX perform checks the presence of user contexts in IFRT objects in their layer.

PiperOrigin-RevId: 825760230
2025-10-29 16:46:05 -07:00
Parker Schuh
0b0ff7c8ac Change RawSEDeviceMemory to be AsyncValueRef.
PiperOrigin-RevId: 825735739
2025-10-29 15:49:26 -07:00
Eugene Zhulenev
0f559dec93 [xla:cpu] Move buffer allocation info encoding to tf2xla
PiperOrigin-RevId: 825732652
2025-10-29 15:40:10 -07:00
Eugene Zhulenev
756a72760a [xla:codegen] Remove MlirKernelDefinition alias
PiperOrigin-RevId: 825724819
2025-10-29 15:22:52 -07:00
Kanish Anand
ff2b8b600d Refactor: Move method definitions from mesh_and_axis.h to .cc file
PiperOrigin-RevId: 825722377
2025-10-29 15:10:16 -07:00
Jake Harmon
83051de423 Add option to tag PJRT wheels with nightly timestamp
PiperOrigin-RevId: 825706994
2025-10-29 14:33:36 -07:00
Felix Wang
cecce70fb2 Rename rail-aligned into world-level in collective_ops_utils.h
Network rail usually refers to a set of  NICs connected by the same fabric/switch, e.g. [Rail-optimized topology](https://developer.nvidia.com/blog/doubling-all2all-performance-with-nvidia-collective-communication-library-2-12/).

PiperOrigin-RevId: 825696577
2025-10-29 14:11:20 -07:00
A. Unique TensorFlower
ca3d7d6305 Integrate LLVM at llvm/llvm-project@028bfa255e
Updates LLVM usage to match
[028bfa255e90](https://github.com/llvm/llvm-project/commit/028bfa255e90)

PiperOrigin-RevId: 825670183
2025-10-29 13:04:28 -07:00
Henning Becker
757f0ac980 Add proto serialization for GpuComputeCapability
PiperOrigin-RevId: 825657032
2025-10-29 12:31:09 -07:00
Will Froom
09d56a9643 [XLA:CPU][XTile] Add lowering for reshape.
PiperOrigin-RevId: 825605674
2025-10-29 10:47:02 -07:00
Ilya Tikhonovskiy
82dc95c293 [XLA:GPU] rename thunk_checksum_tracing_pass to thunk_buffer_debug_pass
It is pure mechanical move cl.

The goal is to use the pass for all the buffer debug checks. We have checksum and nan_counter kernels at the moment.

PiperOrigin-RevId: 825602375
2025-10-29 10:27:29 -07:00
Will Froom
d717d76122 [XLA:CPU][XTile] Add lowering for broadcast.
PiperOrigin-RevId: 825578568
2025-10-29 09:28:41 -07:00
Will Froom
684717efe0 [XLA][XTile] Add pass to verify that a module conforms to XTile specification.
PiperOrigin-RevId: 825488424
2025-10-29 04:49:44 -07:00
Alex
8dc7ce7547 PR #33085: [ROCm] Fix too strict default spanw strategy for rbe builds
Imported from GitHub PR https://github.com/openxla/xla/pull/33085

📝 Summary of Changes
Fix too strict spawn strategy for rbe builds

🎯 Justification
remote only execution is not possible for all the tests

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
Not relevant

🧪 Unit Tests:
Not relevant

🧪 Execution Tests:
Not relevant

Copybara import of the project:

--
df73e6e006c47d5ada1e14ced8f2ae94c0df7dd8 by Alexandros Theodoridis <atheodor@amd.com>:

Fix too strict default spanw strategy for rbe builds

Merging this change closes #33085

PiperOrigin-RevId: 825463234
2025-10-29 03:33:03 -07:00
Marcin Radomski
8b47f52ef7 [XLA:GPU] Add BufferDebugLogEntryMetadataStore
Encoding extra metadata about an debug log entry within its ID limits how much
information we can pass. To remove the limitation without the need to pass
extra data between host and device, introduce a metadata store that provides a
opaque ID -> metadata mapping.

Follow up patches will make checksum/NaN tracing use
BufferDebugLogEntryMetadataStore shared between all thunks that operate on
BufferDebugLog:

- BuffersChecksumThunks put the metadata into the store and use the returned
  entry_ids to identify the checksums from BufferDebugLog,
- xla_gpu_buffer_debug_log_dump reads the BufferDebugLog and uses the store to
  resolve the entry_ids into the metadata.

PiperOrigin-RevId: 825462635
2025-10-29 03:18:55 -07:00
A. Unique TensorFlower
7b7a64f3c8 Automated Code Change
PiperOrigin-RevId: 825457902
2025-10-29 03:03:30 -07:00
dependabot[bot]
bbd2fb5cf8 PR #33141: Bump github/codeql-action from 4.30.9 to 4.31.0
Imported from GitHub PR https://github.com/openxla/xla/pull/33141

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/releases">github/codeql-action's releases</a>.</em></p>
<blockquote>
<h2>v4.31.0</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<p>See the full <a href="https://github.com/github/codeql-action/blob/v4.31.0/CHANGELOG.md">CHANGELOG.md</a> for more information.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's changelog</a>.</em></p>
<blockquote>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>[UNRELEASED]</h2>
<p>No user facing changes.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<h2>4.30.9 - 17 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.3. <a href="https://redirect.github.com/github/codeql-action/pull/3205">#3205</a></li>
<li>Experimental: A new <code>setup-codeql</code> action has been added which is similar to <code>init</code>, except it only installs the CodeQL CLI and does not initialize a database. Do not use this in production as it is part of an internal experiment and subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3204">#3204</a></li>
</ul>
<h2>4.30.8 - 10 Oct 2025</h2>
<p>No user facing changes.</p>
<h2>4.30.7 - 06 Oct 2025</h2>
<ul>
<li>[v4+ only] The CodeQL Action now runs on Node.js v24. <a href="https://redirect.github.com/github/codeql-action/pull/3169">#3169</a></li>
</ul>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with <code>upload-sarif</code> which resulted in files without a <code>.sarif</code> extension not getting uploaded. <a href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the workflow it is used in does not use different versions of the CodeQL Action for different workflow steps. Mixing different versions of the CodeQL Action in the same workflow is unsupported and can lead to unpredictable results. A warning will now be emitted from the <code>codeql-action/init</code> step if different versions of the CodeQL Action are detected in the workflow file. Additionally, an error will now be thrown by the other CodeQL Action steps if they load a configuration file that was generated by a different version of the <code>codeql-action/init</code> step. <a href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a> and <a href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java analyses, which will reduce cache usage and speed up workflows. This will be enabled automatically at a later time. <a href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing <code>tools: nightly</code> to the <code>init</code> action. In general, the nightly bundle is unstable and we only recommend running it when directed by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<h2>3.30.3 - 10 Sep 2025</h2>
<p>No user facing changes.</p>
<h2>3.30.2 - 09 Sep 2025</h2>
<ul>
<li>Fixed a bug which could cause language autodetection to fail. <a href="https://redirect.github.com/github/codeql-action/pull/3084">#3084</a></li>
<li>Experimental: The <code>quality-queries</code> input that was added in <code>3.29.2</code> as part of an internal experiment is now deprecated and will be removed in an upcoming version of the CodeQL Action. It has been superseded by a new <code>analysis-kinds</code> input, which is part of the same internal experiment. Do not use this in production as it is subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3064">#3064</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="4e94bd11f7"><code>4e94bd1</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3235">#3235</a> from github/update-v4.31.0-1d36546c1</li>
<li><a href="8f11182164"><code>8f11182</code></a> Update changelog for v4.31.0</li>
<li><a href="1d36546c14"><code>1d36546</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3234">#3234</a> from github/mbg/changelog/post-processing</li>
<li><a href="08ada26e6a"><code>08ada26</code></a> Add changelog entry for post-processing change</li>
<li><a href="b843cbeed0"><code>b843cbe</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3233">#3233</a> from github/mbg/getOptionalEnvVar</li>
<li><a href="1ecd563919"><code>1ecd563</code></a> Use <code>getOptionalEnvVar</code> in <code>writePostProcessedFiles</code></li>
<li><a href="e576807920"><code>e576807</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3223">#3223</a> from github/henrymercer/bump-minimum</li>
<li><a href="ad35676669"><code>ad35676</code></a> Add <code>getOptionalEnvVar</code> function</li>
<li><a href="d75645b13f"><code>d75645b</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3222">#3222</a> from github/mbg/upload-lib/post-process</li>
<li><a href="710606cc35"><code>710606c</code></a> Check that <code>outputPath</code> is non-empty</li>
<li>Additional commits viewable in <a href="16140ae1a1...4e94bd11f7">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=4.30.9&new-version=4.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
Copybara import of the project:

--
cbe7908eed34d441708d7360f23dad04e5b48ee1 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump github/codeql-action from 4.30.9 to 4.31.0

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](16140ae1a1...4e94bd11f7)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #33141

PiperOrigin-RevId: 825449574
2025-10-29 02:52:45 -07:00
Ilia Sergachev
2e53225273 PR #33205: [GPU] Fix reduce-precision simplification.
Imported from GitHub PR https://github.com/openxla/xla/pull/33205

📝 Summary of Changes
The simplification was unintentionally disabled in
2accf052cb.

🎯 Justification
Bug fix.

🚀 Kind of Contribution
🐛 Bug Fix, ️ Performance Improvement

📊 Benchmark (for Performance Improvements)
No

🧪 Unit Tests:
Yes.

🧪 Execution Tests:
No.
Copybara import of the project:

--
2fb682c10ff49212044dd995ba97aa329e52bb71 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix reduce-precision simplification.

Was unintentionally disabled in
2accf052cb.

Merging this change closes #33205

PiperOrigin-RevId: 825449067
2025-10-29 02:40:39 -07:00
Will Froom
a8356f63df [XLA:CPU] Fix UnrollExtractLoops to stop false-positive unroll when the vector itself is dependent on for loop.
PiperOrigin-RevId: 825427562
2025-10-29 01:57:31 -07:00
Alexander Belyaev
f19413ab39 [XLA:GPU] Move SymbolicExpr out of the gpu:: namespace.
PiperOrigin-RevId: 825420017
2025-10-29 01:43:55 -07:00
Kanish Anand
fb796c3475 Clarify maximal mesh documentation.
PiperOrigin-RevId: 825417349
2025-10-29 01:34:30 -07:00
A. Unique TensorFlower
12f2b7d4e6 Automated Code Change
PiperOrigin-RevId: 825413784
2025-10-29 01:24:25 -07:00
A. Unique TensorFlower
364a0ac855 [XLA:GPU] Add simple reduce check to multimem tests
PiperOrigin-RevId: 825413069
2025-10-29 01:00:17 -07:00
Vlad Sytchenko
db42f74169 Googly changes four words
PiperOrigin-RevId: 825371929
2025-10-28 22:47:58 -07:00
Eugene Zhulenev
27a45465a0 [xla:codegen] Remove MlirKernelEmitter alias
PiperOrigin-RevId: 825346639
2025-10-28 21:20:29 -07:00
Eugene Zhulenev
1444679887 [xla:codegen] Remove LlvmKernelDefinition alias
Rely on KernelEmitter<T>::KernelDefinition alias and CTAD to avoid spelling full type.

PiperOrigin-RevId: 825336093
2025-10-28 20:48:11 -07:00
Eugene Zhulenev
8e76d82f01 [xla:codegen] Remove LlvmKernelEmitter alias
PiperOrigin-RevId: 825312592
2025-10-28 20:34:13 -07:00
Eugene Zhulenev
f2ecdb670b [xla:ffi] Make XLA_FFI_TypeInfo a versioned struct with size and extension
PiperOrigin-RevId: 825311188
2025-10-28 20:25:38 -07:00
Peter Hawkins
6efea71d95 Update nanobind to commit e507b118927bc3a12446d0ca235e1baaf343932e.
This includes https://github.com/wjakob/nanobind/pull/1191, which fixes a data race observed in free threading builds.

PiperOrigin-RevId: 825304073
2025-10-28 20:15:37 -07:00
Subhankar Shah
a7fb82d82d [XLA:MSA] Allow block prefetching for values that have aliased uses -
* Extend alternate memory chunk reservations for aliased uses.
* Add pinned allocations in alternate memory for aliased uses.
* Mark all aliased allocations as colocated.

Pin all values aliased to the left of the source hlo value being block prefetched to default memory, values aliased to the right of the source hlo value will be pinned to default memory in all cases except - when a scheduling a new prefetch is successful.
* For source values, if a pinned allocation exists and adding uses to the allocation is required.
* For other (aliased) values finalizing the value suffices.

Misc:
* When creating colocated aliased allocations in alternate memory, do not rely on getting the first colocated block from the back of the list.
* Code cleanup.
PiperOrigin-RevId: 825302368
2025-10-28 20:06:21 -07:00
Vlad Sytchenko
de3ce6ee98 [XLA] Allow fine grain SparseCore offloading
PiperOrigin-RevId: 825298200
2025-10-28 19:43:00 -07:00
Eugene Zhulenev
6ef78e9c38 [xla:codegen] Parametrize KernelEmitter by kernel source type
Remove one level of template indirection by always parametrizing by a kernel source type.

PiperOrigin-RevId: 825295806
2025-10-28 19:36:37 -07:00
Alexander Shaposhnikov
b81ecb432f Add initial bits to support reductions.
PiperOrigin-RevId: 825292462
2025-10-28 19:08:17 -07:00
Zixuan Jiang
e3549cef96 Refactor spmd/shardy export_named_computations and import_func_calls.
PiperOrigin-RevId: 825290126
2025-10-28 18:57:37 -07:00
David Majnemer
9b433c3f5a Remove deprecated Shape(const ShapeProto&) constructor.
This change removes the deprecated `Shape(const ShapeProto&)` constructor and updates its call sites to use `Shape::FromProto` instead, which returns a `StatusOr<Shape>`. The call sites now explicitly handle the potential error status.

PiperOrigin-RevId: 825288664
2025-10-28 18:44:36 -07:00
Eugene Zhulenev
1a17862b09 [xla:codegen] Fix warnings in KernelDefinition + cleanup API a bit
PiperOrigin-RevId: 825283848
2025-10-28 18:38:12 -07:00
Niklas Vangerow
c49788eb0a Migrate fft_test to use PjRt.
PiperOrigin-RevId: 825262773
2025-10-28 18:19:53 -07:00
Subhankar Shah
a7654beb5c Cleanup: Remove types and _types modules from xla/python/tools because it is dead code
PiperOrigin-RevId: 825262464
2025-10-28 18:11:11 -07:00
Bryan Massoth
f5f9bd8099 Upgrade TraceMeProducer/Consumer traces as kCritical due to them being required to group traces across async boundaries.
PiperOrigin-RevId: 825245546
2025-10-28 17:50:08 -07:00
Jian Cai
8e130fe8be [XLA][Numerics][HLO Value Tracking] Print the entire instruction for mismatch tuple structures
This prints the entire instruction instead of instruction name when the original value has a different tuple structure with the instruction shape in HLO verifier pass. The instruction name is not reliable and could be different from its name in the HLO dump.

PiperOrigin-RevId: 825241833
2025-10-28 17:30:30 -07:00
Eugene Zhulenev
a72cd9ceeb [xla:codegen] Move KernelEmitter::name() to base class and change type to absl::string_view
PiperOrigin-RevId: 825237762
2025-10-28 17:21:00 -07:00
Kevin Gleason
03f4c66dd1 [StableHLO Optim] Add CompareOp patterns and dont fold large converts.
PiperOrigin-RevId: 825236477
2025-10-28 17:11:15 -07:00
A. Unique TensorFlower
f8f2123d5a Add another call to ynn_optimize_subgraph
PiperOrigin-RevId: 825233680
2025-10-28 16:47:05 -07:00
Karlo Basioli
8ab9ebee94 [XLA:CPU] Guard LLVM command line options when infering target machine
PiperOrigin-RevId: 825233596
2025-10-28 16:33:16 -07:00
A. Unique TensorFlower
281fa6f4d3 Fix dot library predicates
This changes the predicates for calling a library for a dot to indicate whether we will actually call the library, not just whether the library supports the dot. This fixes bugs where we incorrectly claim a library will handle the dot.

PiperOrigin-RevId: 825231317
2025-10-28 16:23:53 -07:00
Parker Schuh
d7b371034b Update users not to set untuple_result now that it is true by default.
PiperOrigin-RevId: 825230517
2025-10-28 16:15:35 -07:00
Eugene Zhulenev
3ea8166731 [xla:ffi] Add xla::ffi::MakeTypeInfo
PiperOrigin-RevId: 825230372
2025-10-28 16:05:25 -07:00
Eugene Zhulenev
1d829c0f02 [xla] Update documentation to use xla::Future
PiperOrigin-RevId: 825222421
2025-10-28 15:56:02 -07:00
A. Unique TensorFlower
24abec31c7 Add 1D array test case for SPMD DUS (both to verify the HLO transformation and the overall correctness)
PiperOrigin-RevId: 825222307
2025-10-28 15:46:59 -07:00
Jian Cai
66078903f7 [XLA][Numerics][HLO Original Value] Support original values for some cases in while loop simplifier pass
This updates the original value of a while loop if any unused parameters are removed.

PiperOrigin-RevId: 825221785
2025-10-28 15:34:04 -07:00