Commit Graph

28187 Commits

Author SHA1 Message Date
Hyeontaek Lim
b1d5462115 [IFRT] Remove -DIFRT_REQUIRE_USER_CONTEXT that is no longer used for detecting missing user contexts.
IFRT users such as JAX perform checks the presence of user contexts in IFRT objects in their layer.

PiperOrigin-RevId: 825760230
2025-10-29 16:46:05 -07:00
Parker Schuh
0b0ff7c8ac Change RawSEDeviceMemory to be AsyncValueRef.
PiperOrigin-RevId: 825735739
2025-10-29 15:49:26 -07:00
Eugene Zhulenev
0f559dec93 [xla:cpu] Move buffer allocation info encoding to tf2xla
PiperOrigin-RevId: 825732652
2025-10-29 15:40:10 -07:00
Eugene Zhulenev
756a72760a [xla:codegen] Remove MlirKernelDefinition alias
PiperOrigin-RevId: 825724819
2025-10-29 15:22:52 -07:00
Kanish Anand
ff2b8b600d Refactor: Move method definitions from mesh_and_axis.h to .cc file
PiperOrigin-RevId: 825722377
2025-10-29 15:10:16 -07:00
Jake Harmon
83051de423 Add option to tag PJRT wheels with nightly timestamp
PiperOrigin-RevId: 825706994
2025-10-29 14:33:36 -07:00
Felix Wang
cecce70fb2 Rename rail-aligned into world-level in collective_ops_utils.h
Network rail usually refers to a set of  NICs connected by the same fabric/switch, e.g. [Rail-optimized topology](https://developer.nvidia.com/blog/doubling-all2all-performance-with-nvidia-collective-communication-library-2-12/).

PiperOrigin-RevId: 825696577
2025-10-29 14:11:20 -07:00
A. Unique TensorFlower
ca3d7d6305 Integrate LLVM at llvm/llvm-project@028bfa255e
Updates LLVM usage to match
[028bfa255e90](https://github.com/llvm/llvm-project/commit/028bfa255e90)

PiperOrigin-RevId: 825670183
2025-10-29 13:04:28 -07:00
Henning Becker
757f0ac980 Add proto serialization for GpuComputeCapability
PiperOrigin-RevId: 825657032
2025-10-29 12:31:09 -07:00
Will Froom
09d56a9643 [XLA:CPU][XTile] Add lowering for reshape.
PiperOrigin-RevId: 825605674
2025-10-29 10:47:02 -07:00
Ilya Tikhonovskiy
82dc95c293 [XLA:GPU] rename thunk_checksum_tracing_pass to thunk_buffer_debug_pass
It is pure mechanical move cl.

The goal is to use the pass for all the buffer debug checks. We have checksum and nan_counter kernels at the moment.

PiperOrigin-RevId: 825602375
2025-10-29 10:27:29 -07:00
Will Froom
d717d76122 [XLA:CPU][XTile] Add lowering for broadcast.
PiperOrigin-RevId: 825578568
2025-10-29 09:28:41 -07:00
Will Froom
684717efe0 [XLA][XTile] Add pass to verify that a module conforms to XTile specification.
PiperOrigin-RevId: 825488424
2025-10-29 04:49:44 -07:00
Alex
8dc7ce7547 PR #33085: [ROCm] Fix too strict default spanw strategy for rbe builds
Imported from GitHub PR https://github.com/openxla/xla/pull/33085

📝 Summary of Changes
Fix too strict spawn strategy for rbe builds

🎯 Justification
remote only execution is not possible for all the tests

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
Not relevant

🧪 Unit Tests:
Not relevant

🧪 Execution Tests:
Not relevant

Copybara import of the project:

--
df73e6e006c47d5ada1e14ced8f2ae94c0df7dd8 by Alexandros Theodoridis <atheodor@amd.com>:

Fix too strict default spanw strategy for rbe builds

Merging this change closes #33085

PiperOrigin-RevId: 825463234
2025-10-29 03:33:03 -07:00
Marcin Radomski
8b47f52ef7 [XLA:GPU] Add BufferDebugLogEntryMetadataStore
Encoding extra metadata about an debug log entry within its ID limits how much
information we can pass. To remove the limitation without the need to pass
extra data between host and device, introduce a metadata store that provides a
opaque ID -> metadata mapping.

Follow up patches will make checksum/NaN tracing use
BufferDebugLogEntryMetadataStore shared between all thunks that operate on
BufferDebugLog:

- BuffersChecksumThunks put the metadata into the store and use the returned
  entry_ids to identify the checksums from BufferDebugLog,
- xla_gpu_buffer_debug_log_dump reads the BufferDebugLog and uses the store to
  resolve the entry_ids into the metadata.

PiperOrigin-RevId: 825462635
2025-10-29 03:18:55 -07:00
A. Unique TensorFlower
7b7a64f3c8 Automated Code Change
PiperOrigin-RevId: 825457902
2025-10-29 03:03:30 -07:00
dependabot[bot]
bbd2fb5cf8 PR #33141: Bump github/codeql-action from 4.30.9 to 4.31.0
Imported from GitHub PR https://github.com/openxla/xla/pull/33141

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/releases">github/codeql-action's releases</a>.</em></p>
<blockquote>
<h2>v4.31.0</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<p>See the full <a href="https://github.com/github/codeql-action/blob/v4.31.0/CHANGELOG.md">CHANGELOG.md</a> for more information.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's changelog</a>.</em></p>
<blockquote>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>[UNRELEASED]</h2>
<p>No user facing changes.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<h2>4.30.9 - 17 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.3. <a href="https://redirect.github.com/github/codeql-action/pull/3205">#3205</a></li>
<li>Experimental: A new <code>setup-codeql</code> action has been added which is similar to <code>init</code>, except it only installs the CodeQL CLI and does not initialize a database. Do not use this in production as it is part of an internal experiment and subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3204">#3204</a></li>
</ul>
<h2>4.30.8 - 10 Oct 2025</h2>
<p>No user facing changes.</p>
<h2>4.30.7 - 06 Oct 2025</h2>
<ul>
<li>[v4+ only] The CodeQL Action now runs on Node.js v24. <a href="https://redirect.github.com/github/codeql-action/pull/3169">#3169</a></li>
</ul>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with <code>upload-sarif</code> which resulted in files without a <code>.sarif</code> extension not getting uploaded. <a href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the workflow it is used in does not use different versions of the CodeQL Action for different workflow steps. Mixing different versions of the CodeQL Action in the same workflow is unsupported and can lead to unpredictable results. A warning will now be emitted from the <code>codeql-action/init</code> step if different versions of the CodeQL Action are detected in the workflow file. Additionally, an error will now be thrown by the other CodeQL Action steps if they load a configuration file that was generated by a different version of the <code>codeql-action/init</code> step. <a href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a> and <a href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java analyses, which will reduce cache usage and speed up workflows. This will be enabled automatically at a later time. <a href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing <code>tools: nightly</code> to the <code>init</code> action. In general, the nightly bundle is unstable and we only recommend running it when directed by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<h2>3.30.3 - 10 Sep 2025</h2>
<p>No user facing changes.</p>
<h2>3.30.2 - 09 Sep 2025</h2>
<ul>
<li>Fixed a bug which could cause language autodetection to fail. <a href="https://redirect.github.com/github/codeql-action/pull/3084">#3084</a></li>
<li>Experimental: The <code>quality-queries</code> input that was added in <code>3.29.2</code> as part of an internal experiment is now deprecated and will be removed in an upcoming version of the CodeQL Action. It has been superseded by a new <code>analysis-kinds</code> input, which is part of the same internal experiment. Do not use this in production as it is subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3064">#3064</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="4e94bd11f7"><code>4e94bd1</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3235">#3235</a> from github/update-v4.31.0-1d36546c1</li>
<li><a href="8f11182164"><code>8f11182</code></a> Update changelog for v4.31.0</li>
<li><a href="1d36546c14"><code>1d36546</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3234">#3234</a> from github/mbg/changelog/post-processing</li>
<li><a href="08ada26e6a"><code>08ada26</code></a> Add changelog entry for post-processing change</li>
<li><a href="b843cbeed0"><code>b843cbe</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3233">#3233</a> from github/mbg/getOptionalEnvVar</li>
<li><a href="1ecd563919"><code>1ecd563</code></a> Use <code>getOptionalEnvVar</code> in <code>writePostProcessedFiles</code></li>
<li><a href="e576807920"><code>e576807</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3223">#3223</a> from github/henrymercer/bump-minimum</li>
<li><a href="ad35676669"><code>ad35676</code></a> Add <code>getOptionalEnvVar</code> function</li>
<li><a href="d75645b13f"><code>d75645b</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3222">#3222</a> from github/mbg/upload-lib/post-process</li>
<li><a href="710606cc35"><code>710606c</code></a> Check that <code>outputPath</code> is non-empty</li>
<li>Additional commits viewable in <a href="16140ae1a1...4e94bd11f7">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=4.30.9&new-version=4.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
Copybara import of the project:

--
cbe7908eed34d441708d7360f23dad04e5b48ee1 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump github/codeql-action from 4.30.9 to 4.31.0

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](16140ae1a1...4e94bd11f7)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #33141

PiperOrigin-RevId: 825449574
2025-10-29 02:52:45 -07:00
Ilia Sergachev
2e53225273 PR #33205: [GPU] Fix reduce-precision simplification.
Imported from GitHub PR https://github.com/openxla/xla/pull/33205

📝 Summary of Changes
The simplification was unintentionally disabled in
2accf052cb.

🎯 Justification
Bug fix.

🚀 Kind of Contribution
🐛 Bug Fix, ️ Performance Improvement

📊 Benchmark (for Performance Improvements)
No

🧪 Unit Tests:
Yes.

🧪 Execution Tests:
No.
Copybara import of the project:

--
2fb682c10ff49212044dd995ba97aa329e52bb71 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix reduce-precision simplification.

Was unintentionally disabled in
2accf052cb.

Merging this change closes #33205

PiperOrigin-RevId: 825449067
2025-10-29 02:40:39 -07:00
Will Froom
a8356f63df [XLA:CPU] Fix UnrollExtractLoops to stop false-positive unroll when the vector itself is dependent on for loop.
PiperOrigin-RevId: 825427562
2025-10-29 01:57:31 -07:00
Alexander Belyaev
f19413ab39 [XLA:GPU] Move SymbolicExpr out of the gpu:: namespace.
PiperOrigin-RevId: 825420017
2025-10-29 01:43:55 -07:00
Kanish Anand
fb796c3475 Clarify maximal mesh documentation.
PiperOrigin-RevId: 825417349
2025-10-29 01:34:30 -07:00
A. Unique TensorFlower
12f2b7d4e6 Automated Code Change
PiperOrigin-RevId: 825413784
2025-10-29 01:24:25 -07:00
A. Unique TensorFlower
364a0ac855 [XLA:GPU] Add simple reduce check to multimem tests
PiperOrigin-RevId: 825413069
2025-10-29 01:00:17 -07:00
Vlad Sytchenko
db42f74169 Googly changes four words
PiperOrigin-RevId: 825371929
2025-10-28 22:47:58 -07:00
Eugene Zhulenev
27a45465a0 [xla:codegen] Remove MlirKernelEmitter alias
PiperOrigin-RevId: 825346639
2025-10-28 21:20:29 -07:00
Eugene Zhulenev
1444679887 [xla:codegen] Remove LlvmKernelDefinition alias
Rely on KernelEmitter<T>::KernelDefinition alias and CTAD to avoid spelling full type.

PiperOrigin-RevId: 825336093
2025-10-28 20:48:11 -07:00
Eugene Zhulenev
8e76d82f01 [xla:codegen] Remove LlvmKernelEmitter alias
PiperOrigin-RevId: 825312592
2025-10-28 20:34:13 -07:00
Eugene Zhulenev
f2ecdb670b [xla:ffi] Make XLA_FFI_TypeInfo a versioned struct with size and extension
PiperOrigin-RevId: 825311188
2025-10-28 20:25:38 -07:00
Peter Hawkins
6efea71d95 Update nanobind to commit e507b118927bc3a12446d0ca235e1baaf343932e.
This includes https://github.com/wjakob/nanobind/pull/1191, which fixes a data race observed in free threading builds.

PiperOrigin-RevId: 825304073
2025-10-28 20:15:37 -07:00
Subhankar Shah
a7fb82d82d [XLA:MSA] Allow block prefetching for values that have aliased uses -
* Extend alternate memory chunk reservations for aliased uses.
* Add pinned allocations in alternate memory for aliased uses.
* Mark all aliased allocations as colocated.

Pin all values aliased to the left of the source hlo value being block prefetched to default memory, values aliased to the right of the source hlo value will be pinned to default memory in all cases except - when a scheduling a new prefetch is successful.
* For source values, if a pinned allocation exists and adding uses to the allocation is required.
* For other (aliased) values finalizing the value suffices.

Misc:
* When creating colocated aliased allocations in alternate memory, do not rely on getting the first colocated block from the back of the list.
* Code cleanup.
PiperOrigin-RevId: 825302368
2025-10-28 20:06:21 -07:00
Vlad Sytchenko
de3ce6ee98 [XLA] Allow fine grain SparseCore offloading
PiperOrigin-RevId: 825298200
2025-10-28 19:43:00 -07:00
Eugene Zhulenev
6ef78e9c38 [xla:codegen] Parametrize KernelEmitter by kernel source type
Remove one level of template indirection by always parametrizing by a kernel source type.

PiperOrigin-RevId: 825295806
2025-10-28 19:36:37 -07:00
Alexander Shaposhnikov
b81ecb432f Add initial bits to support reductions.
PiperOrigin-RevId: 825292462
2025-10-28 19:08:17 -07:00
Zixuan Jiang
e3549cef96 Refactor spmd/shardy export_named_computations and import_func_calls.
PiperOrigin-RevId: 825290126
2025-10-28 18:57:37 -07:00
David Majnemer
9b433c3f5a Remove deprecated Shape(const ShapeProto&) constructor.
This change removes the deprecated `Shape(const ShapeProto&)` constructor and updates its call sites to use `Shape::FromProto` instead, which returns a `StatusOr<Shape>`. The call sites now explicitly handle the potential error status.

PiperOrigin-RevId: 825288664
2025-10-28 18:44:36 -07:00
Eugene Zhulenev
1a17862b09 [xla:codegen] Fix warnings in KernelDefinition + cleanup API a bit
PiperOrigin-RevId: 825283848
2025-10-28 18:38:12 -07:00
Niklas Vangerow
c49788eb0a Migrate fft_test to use PjRt.
PiperOrigin-RevId: 825262773
2025-10-28 18:19:53 -07:00
Subhankar Shah
a7654beb5c Cleanup: Remove types and _types modules from xla/python/tools because it is dead code
PiperOrigin-RevId: 825262464
2025-10-28 18:11:11 -07:00
Bryan Massoth
f5f9bd8099 Upgrade TraceMeProducer/Consumer traces as kCritical due to them being required to group traces across async boundaries.
PiperOrigin-RevId: 825245546
2025-10-28 17:50:08 -07:00
Jian Cai
8e130fe8be [XLA][Numerics][HLO Value Tracking] Print the entire instruction for mismatch tuple structures
This prints the entire instruction instead of instruction name when the original value has a different tuple structure with the instruction shape in HLO verifier pass. The instruction name is not reliable and could be different from its name in the HLO dump.

PiperOrigin-RevId: 825241833
2025-10-28 17:30:30 -07:00
Eugene Zhulenev
a72cd9ceeb [xla:codegen] Move KernelEmitter::name() to base class and change type to absl::string_view
PiperOrigin-RevId: 825237762
2025-10-28 17:21:00 -07:00
Kevin Gleason
03f4c66dd1 [StableHLO Optim] Add CompareOp patterns and dont fold large converts.
PiperOrigin-RevId: 825236477
2025-10-28 17:11:15 -07:00
A. Unique TensorFlower
f8f2123d5a Add another call to ynn_optimize_subgraph
PiperOrigin-RevId: 825233680
2025-10-28 16:47:05 -07:00
Karlo Basioli
8ab9ebee94 [XLA:CPU] Guard LLVM command line options when infering target machine
PiperOrigin-RevId: 825233596
2025-10-28 16:33:16 -07:00
A. Unique TensorFlower
281fa6f4d3 Fix dot library predicates
This changes the predicates for calling a library for a dot to indicate whether we will actually call the library, not just whether the library supports the dot. This fixes bugs where we incorrectly claim a library will handle the dot.

PiperOrigin-RevId: 825231317
2025-10-28 16:23:53 -07:00
Parker Schuh
d7b371034b Update users not to set untuple_result now that it is true by default.
PiperOrigin-RevId: 825230517
2025-10-28 16:15:35 -07:00
Eugene Zhulenev
3ea8166731 [xla:ffi] Add xla::ffi::MakeTypeInfo
PiperOrigin-RevId: 825230372
2025-10-28 16:05:25 -07:00
Eugene Zhulenev
1d829c0f02 [xla] Update documentation to use xla::Future
PiperOrigin-RevId: 825222421
2025-10-28 15:56:02 -07:00
A. Unique TensorFlower
24abec31c7 Add 1D array test case for SPMD DUS (both to verify the HLO transformation and the overall correctness)
PiperOrigin-RevId: 825222307
2025-10-28 15:46:59 -07:00
Jian Cai
66078903f7 [XLA][Numerics][HLO Original Value] Support original values for some cases in while loop simplifier pass
This updates the original value of a while loop if any unused parameters are removed.

PiperOrigin-RevId: 825221785
2025-10-28 15:34:04 -07:00
Haibo Huang
202bd1ac59 Remove IsTpuTopology and IsGpuTopology and IsCpuTopology
PiperOrigin-RevId: 825216555
2025-10-28 15:17:52 -07:00
Matt Hurd
803a513588 Allow specifying a custom session id instead of always using timestamp.
This change introduces a `session_id` option in `ProfileOptions` and `RemoteProfilerSessionManagerOptions`. When provided, this id will be used as the subdirectory for storing profile data, instead of the default timestamp.

PiperOrigin-RevId: 825216123
2025-10-28 15:10:15 -07:00
Parker Schuh
fef8806609 Remove unnecessary limit.
PiperOrigin-RevId: 825212681
2025-10-28 14:54:58 -07:00
Antonio Sanchez
b531e70088 Updating internal visibility rules.
PiperOrigin-RevId: 825202091
2025-10-28 14:26:07 -07:00
Victor Stone
63a9d0d1f8 If device placement annotations are found inside host computations (as a result of nested host computations), hoist them up the call stack. If any unsupported cases or inconsistencies are detected, an error will be returned to the user.
This allows JAX's migration from their previous `compute_on` API to the new (currently named `compute_on2`) API.

PiperOrigin-RevId: 825177029
2025-10-28 13:24:57 -07:00
A. Unique TensorFlower
768e653c9c Integrate LLVM at llvm/llvm-project@29c830cbf8
Updates LLVM usage to match
[29c830cbf8c6](https://github.com/llvm/llvm-project/commit/29c830cbf8c6)

PiperOrigin-RevId: 825166312
2025-10-28 13:03:12 -07:00
Eugene Zhulenev
44446df9cf [xla:cpu] Remove mlir/llvm kernel_definition and kernel_emitter libraries
PiperOrigin-RevId: 825133247
2025-10-28 11:46:01 -07:00
Bill Varcho
63d558e46f [ReplicaGroupV3][Mesh + AxesRef] add to/from proto functions + equality op to XLA definitions of Mesh and AxesRef.
PiperOrigin-RevId: 825107889
2025-10-28 11:18:27 -07:00
Henning Becker
4dfbd3bd0c Add proto serialization for RocmComputeCapability
PiperOrigin-RevId: 825103988
2025-10-28 11:08:48 -07:00
Eugene Zhulenev
fa61547732 [xla:cpu] Rename LlvmIrKernelSource to LlvmKernelSource
PiperOrigin-RevId: 825103892
2025-10-28 10:59:06 -07:00
Will Froom
2ef85038ed [XLA:CPU] Measure process CPU time in reduction benchmark.
PiperOrigin-RevId: 825102754
2025-10-28 10:44:13 -07:00
Henning Becker
c7dd4775b7 Add missing dependency to TSL target
Without that the layering check was failing.

PiperOrigin-RevId: 825098439
2025-10-28 10:30:46 -07:00
Ilya Tikhonovskiy
14db6f6317 [XLA:GPU] follow up fix after pr#32919
Change `gpu_version` parameter to const reference in `IntelGpuCompiler`.

This aligns the parameter type in `OptimizeHloConvolutionCanonicalization` with the base class signature.

PiperOrigin-RevId: 825083863
2025-10-28 10:12:54 -07:00
Eugene Zhulenev
d75ad2c4ff [xla:cpu] Cleanup KernelSpec to use absl::string_view and absl::Span
PiperOrigin-RevId: 825075908
2025-10-28 09:56:20 -07:00
A. Unique TensorFlower
10fd9cfebb Iterate on the functions of the map in a deterministic order to create function names deterministically.
When a function has multiple instances with different manual axes, and dedup-functions-fully is on, it will have different copies of the same function.

For example:

sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
  sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
  sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.named_computation<"foo">(%arg0) (%arg1: tensor<8xf32>) {}

----->

sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
  call @foo(%arg1)
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
  call @foo_0(%arg1)
}
call @foo_1(%arg0)

The order of the iteration on the map/vector determines the which 'foo' will become 'foo_0', 'foo_1', or stay as 'foo'.

PiperOrigin-RevId: 825074314
2025-10-28 09:49:24 -07:00
Dimitris Vardoulakis
0ad542ea89 PR #33117: Rename "forward compatible" capabilities to "family compatible", per NVIDIA naming.
Imported from GitHub PR https://github.com/openxla/xla/pull/33117

See:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Family compatible supported was introduced in 16b9d957ff.
It's not clear to me how someone can actually configure XLA to use sm_100f. Will look into that next.
Copybara import of the project:

--
331d40c9c93ffb3a5c97e53e4017f604aa23d221 by Dimitris Vardoulakis <dvardoulakis@nvidia.com>:

Rename "forward compatible" capabilities to "family compatible",
per NVIDIA naming.

See:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Merging this change closes #33117

PiperOrigin-RevId: 825073858
2025-10-28 09:38:56 -07:00
Niklas Vangerow
4aabddab2d Migrate conv_depthwise_test to use PjRt.
PiperOrigin-RevId: 825064898
2025-10-28 09:32:28 -07:00
Jian Cai
7c6d13443d [XLA] Add a member function to check if a tuple tree has any tuples
The function returns true if a tuple has only a root node.

PiperOrigin-RevId: 825062842
2025-10-28 09:22:15 -07:00
Shaogang Wang
d1ca03b626 PR #33149: [XLA:GPU] add VLOG dump option that only prints out primary command buffer graph.
Imported from GitHub PR https://github.com/openxla/xla/pull/33149

📝 Summary of Changes
 add cuda graph dump option that only prints out primary graph, so not to flush the screen log with nested cuda graph.

🎯 Justification
Easy debug read

🚀 Kind of Contribution
Please remove what does not apply📚 Documentation

Copybara import of the project:

--
18d6939170fd5bf4fa9228d4f74ca3ff4e83ec17 by Shawn Wang <shawnw@nvidia.com>:

add cuda graph dump option that only prints out primary graph

Merging this change closes #33149

PiperOrigin-RevId: 825049186
2025-10-28 09:03:40 -07:00
A. Unique TensorFlower
da8b9bf004 [Autotuner] Add (de)serialization from/to string in cache.
- This is required to port sharding from gemm_fusion_autotuner.

PiperOrigin-RevId: 825045654
2025-10-28 08:35:56 -07:00
Ilya Tikhonovskiy
ffca28bcf8 [XLA:GPU] Ignore reductions over dimensions of size 1 in UnstableReductionDetect
The UnstableReductionDetector now considers reductions where all reduced dimensions have a size of 1 to be stable, as these operations are effectively no-ops and do not introduce numerical instability. A test case is added to verify this behavior.

PiperOrigin-RevId: 825045042
2025-10-28 08:27:58 -07:00
Marcin Radomski
7334d07917 [XLA:GPU] Add check_thunk_result_consistency tool for verifying checksum consistency
When implementing this it turned out that the log is currently missing some information needed to reliably distinguish input/output checksums and different thunk executions. This adds the needed fields to the proto, but emitting them in the log will be a separate change.

With the extra data missing, the tool assumes all checksums refer to outputs, and each thunk execution is going to give the same results each time. The tests include the extra data, so once that's implement it should(TM) just work.

PiperOrigin-RevId: 825040798
2025-10-28 08:20:28 -07:00
Sohaib Iftikhar
542ffe0410 [XLA:GPU]: Add peer to peer copies for cupti tracing.
Before this change peer to peer copies done using
cuMemcpyPeerAsync was not being tracked with the driver API.

PiperOrigin-RevId: 825040246
2025-10-28 08:08:56 -07:00
Benjamin Chetioui
034d750525 [XLA][NFC] Remove line saying that cudaGetLastError is incompatible with command buffers.
It turns out it is (although the broader point still stands).

PiperOrigin-RevId: 825036231
2025-10-28 07:57:07 -07:00
Will Froom
de7a63363c [XLA:CPU][XTile] Implement vectorized reduce.
PiperOrigin-RevId: 825027697
2025-10-28 07:29:37 -07:00
dependabot[bot]
97f4e08c24 PR #33140: Bump actions/upload-artifact from 4.6.1 to 5.0.0
Imported from GitHub PR https://github.com/openxla/xla/pull/33140

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.1 to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<p><strong>BREAKING CHANGE:</strong> this update supports Node <code>v24.x</code>. This is not a breaking change per-se but we're treating it as such.</p>
<ul>
<li>Update README.md by <a href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li>
<li>Update README.md by <a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li>
<li>Readme: spell out the first use of GHES by <a href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li>
<li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak"><code>@​patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li>
<li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li>
<li>Prepare <code>v5.0.0</code> by <a href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/734">actions/upload-artifact#734</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li>
<li><a href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li>
<li><a href="https://github.com/patrikpolyak"><code>@​patrikpolyak</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v5.0.0">https://github.com/actions/upload-artifact/compare/v4...v5.0.0</a></p>
<h2>v4.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Update to use artifact 2.3.2 package &amp; prepare for new upload-artifact release by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v4.6.2">https://github.com/actions/upload-artifact/compare/v4...v4.6.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="330a01c490"><code>330a01c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/734">#734</a> from actions/danwkennedy/prepare-5.0.0</li>
<li><a href="03f2824452"><code>03f2824</code></a> Update <code>github.dep.yml</code></li>
<li><a href="905a1ecb59"><code>905a1ec</code></a> Prepare <code>v5.0.0</code></li>
<li><a href="2d9f9cdfa9"><code>2d9f9cd</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/725">#725</a> from patrikpolyak/patch-1</li>
<li><a href="9687587dec"><code>9687587</code></a> Merge branch 'main' into patch-1</li>
<li><a href="2848b2cda0"><code>2848b2c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/727">#727</a> from danwkennedy/patch-1</li>
<li><a href="9b511775fd"><code>9b51177</code></a> Spell out the first use of GHES</li>
<li><a href="cd231ca1ed"><code>cd231ca</code></a> Update GHES guidance to include reference to Node 20 version</li>
<li><a href="de65e23aa2"><code>de65e23</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/712">#712</a> from actions/nebuk89-patch-1</li>
<li><a href="8747d8cd76"><code>8747d8c</code></a> Update README.md</li>
<li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=4.6.1&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
Copybara import of the project:

--
5eab24c4d57708cbb45b476265bca2e841706647 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump actions/upload-artifact from 4.6.1 to 5.0.0

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.1 to 5.0.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 5.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #33140

PiperOrigin-RevId: 824999008
2025-10-28 06:14:44 -07:00
Adrian Kuegel
a12d2cfb31 [XLA:GPU] Make ReductionEmitter deterministic.
So far, the output could be non-deterministic if multiple reductions are
grouped together. This change makes it deterministic.

PiperOrigin-RevId: 824965037
2025-10-28 04:32:11 -07:00
Adrian Kuegel
29bc205be3 Update the Shardy pin in XLA
This should resolve shardy related test failures.

PiperOrigin-RevId: 824944776
2025-10-28 03:26:51 -07:00
Zixuan Jiang
85d834e07b Add PrintArray and ArrayToString methods to IotaTileAssignment and TileAssignment.
These new methods allow printing or converting to a string only the array representation of the tile assignment, without including the tile dimensions. The existing `Print` and `ToString` methods are updated to use these new array-specific printing functions.

PiperOrigin-RevId: 824816702
2025-10-27 21:19:25 -07:00
Zixuan Jiang
402ead44b2 The compatible factor shardings should not have overlap between axes across different tensors.
PiperOrigin-RevId: 824815687
2025-10-27 21:09:47 -07:00
Eugene Zhulenev
c09d68c588 [xla:ffi] Remove unused context decoding for C API internals
PiperOrigin-RevId: 824792000
2025-10-27 19:59:55 -07:00
Michael Kuperstein
b6f66e3e01 [XLA] VLOG instruction count before each HLO pass.
PiperOrigin-RevId: 824791349
2025-10-27 19:50:20 -07:00
A. Unique TensorFlower
4231383b5b Integrate LLVM at llvm/llvm-project@d0a7411cb8
Updates LLVM usage to match
[d0a7411cb840](https://github.com/llvm/llvm-project/commit/d0a7411cb840)

PiperOrigin-RevId: 824767534
2025-10-27 18:28:40 -07:00
A. Unique TensorFlower
6e82d4d96b Add call to ynn_optimize_subgraph
This currently happens implicitly in `ynn_create_runtime`, but that will not be the case soon. (Calling it multiple times is harmless.)

PiperOrigin-RevId: 824754921
2025-10-27 17:53:49 -07:00
Parker Schuh
1fc47fae8e Transition to an error state more aggressively when the socket reports errors.
PiperOrigin-RevId: 824749937
2025-10-27 17:40:30 -07:00
Maxim Ermilov
699879f5f3 add pcie_bandwidth field to DeviceDescription
PiperOrigin-RevId: 824738638
2025-10-27 17:17:00 -07:00
Eugene Zhulenev
769acdd784 [xla] Migrate Tensorflow and XLA to xla::Future
Cleanup BUILD files and fix header includes in preparation for pjrt_future removal.

PiperOrigin-RevId: 824733023
2025-10-27 17:06:39 -07:00
Tori Baker
d299463d26 [xla:gpu] Fix our convert integer to pred in our Triton emitter
`arith.trunci` for i1 will simply take the last bit, but HLO expects convert to i1 to be value != 0. Emit this conversion a a compare not equal to 0 instead. This is already done correctly for floats.

PiperOrigin-RevId: 824716165
2025-10-27 16:14:51 -07:00
Matt Hurd
ffc21f066a Add session_id to profiler_options
This will allow a follow-up PR that allows utilizing this proto.

PiperOrigin-RevId: 824709715
2025-10-27 15:56:14 -07:00
A. Unique TensorFlower
b8ba187ff8 Refactor call_library_for_dot -> library_supports_dot
This enables BF16 to be sent to YNNPACK without casting to F32.

PiperOrigin-RevId: 824667930
2025-10-27 14:11:45 -07:00
A. Unique TensorFlower
e79d4ebeec Update XNNPACK in XLA
- New windows support
- Workaround for Intel AMX in GCC

PiperOrigin-RevId: 824644581
2025-10-27 13:13:02 -07:00
Maxim Ermilov
6e8976e3cf Remove no longer needed one liner methods
PiperOrigin-RevId: 824631750
2025-10-27 12:50:39 -07:00
Niklas Vangerow
56a660da4b Migrate cpu_gpu_fusion_test to use PjRt.
PiperOrigin-RevId: 824627421
2025-10-27 12:39:34 -07:00
A. Unique TensorFlower
a1219cfa94 Integrate LLVM at llvm/llvm-project@d7e40f3e71
Updates LLVM usage to match
[d7e40f3e7165](https://github.com/llvm/llvm-project/commit/d7e40f3e7165)

PiperOrigin-RevId: 824622381
2025-10-27 12:26:58 -07:00
A. Unique TensorFlower
428f0df91b Add a reserve in sampler.cc.
PiperOrigin-RevId: 824620123
2025-10-27 12:14:17 -07:00
Mohammed Anany
6d8ae3a9d7 Add warp specialization to Triton autotuning.
This change introduces `is_warp_specialization_allowed` to `TritonGemmConfig` and `BlockLevelFusionConfig`. The autotuner now explores configurations with warp specialization enabled, but only on Blackwell+ devices and when TMA is also enabled. The fusion emitter uses this new parameter to set the `tt.warp_specialize` attribute.

PiperOrigin-RevId: 824601781
2025-10-27 11:30:32 -07:00
Niklas Vangerow
d7c638ad39 Migrate gpu transforms reduction_layout_normalizer_test to use PjRt.
PiperOrigin-RevId: 824573461
2025-10-27 10:43:38 -07:00
Will Froom
17b1932f13 [XLA:GPU][XTile] Don't cast to intermediate i64 when extracting the program id.
PiperOrigin-RevId: 824573427
2025-10-27 10:25:39 -07:00
Niklas Vangerow
7257c66fae Migrate pad_test to use PjRt.
PiperOrigin-RevId: 824545233
2025-10-27 09:21:20 -07:00
A. Unique TensorFlower
38ebd16aed Remove IndexingMap::GetMutableAffineMap()
This change removes the GetMutableAffineMap() method from xla::IndexingMap. The mutable access to the underlying mlir::AffineMap can't be used because we will use a different internal implementation (SymbolicMap). I also think it's cleaner to not provide this method.

PiperOrigin-RevId: 824536996
2025-10-27 09:05:08 -07:00
Ilya Tikhonovskiy
13376b4b8a [XLA:GPU] change 'checksum' field name to 'value'
We use this field for two different buffer debug kernels that have different semantic. Technically we could have two different structures but it does not makes much sense at the moment. Let's use the one that we already have with the generic name.

PiperOrigin-RevId: 824532743
2025-10-27 08:51:52 -07:00
Eusebio Durán Montaña
5e5976e01f Clean up includes and dependencies in ../gpu/runtime directory.
Had to manually add a `IWYU pragma: keep` in select_k_exec_stub.cc, otherwise the `::xla::bfloat16` type isn't found.

PiperOrigin-RevId: 824529669
2025-10-27 08:39:25 -07:00
A. Unique TensorFlower
fd2941bc67 Update calls to HloModule::CreateFromProto in hlo_module_util to remap instruction ids by default. This should speed up compilation.
PiperOrigin-RevId: 824521542
2025-10-27 08:16:29 -07:00
dependabot[bot]
60ac8fa628 PR #32968: Bump keras from 3.9.0 to 3.11.3 in /xla/backends/cpu/benchmarks/e2e/gemma2/keras
Imported from GitHub PR https://github.com/openxla/xla/pull/32968

Bumps [keras](https://github.com/keras-team/keras) from 3.9.0 to 3.11.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/keras-team/keras/releases">keras's releases</a>.</em></p>
<blockquote>
<h2>Keras 3.11.3</h2>
<h2>What's Changed</h2>
<ul>
<li>Version bump to 3.11.3 by <a href="https://github.com/rtg0795"><code>@​rtg0795</code></a> in <a href="https://redirect.github.com/keras-team/keras/pull/21607">keras-team/keras#21607</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/keras-team/keras/compare/v3.11.2...v3.11.3">https://github.com/keras-team/keras/compare/v3.11.2...v3.11.3</a></p>
<h2>Keras 3.11.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Version bump 3.11.2 and nnx fix <a href="https://redirect.github.com/keras-team/keras/issues/21565">#21565</a> by <a href="https://github.com/laxmareddyp"><code>@​laxmareddyp</code></a> in <a href="https://redirect.github.com/keras-team/keras/pull/21570">keras-team/keras#21570</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/laxmareddyp"><code>@​laxmareddyp</code></a> made their first contribution in <a href="https://redirect.github.com/keras-team/keras/pull/21570">keras-team/keras#21570</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/keras-team/keras/compare/v3.11.1...v3.11.2">https://github.com/keras-team/keras/compare/v3.11.1...v3.11.2</a></p>
<h2>Keras 3.11.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Version bump 3.11.1 by <a href="https://github.com/rtg0795"><code>@​rtg0795</code></a> in <a href="https://redirect.github.com/keras-team/keras/pull/21535">keras-team/keras#21535</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/keras-team/keras/compare/v3.11.0...v3.11.1">https://github.com/keras-team/keras/compare/v3.11.0...v3.11.1</a></p>
<h2>Keras 3.11.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add int4 quantization support.</li>
<li>Support <a href="https://github.com/google/grain">Grain</a> data loaders in <code>fit()</code>/<code>evaluate()</code>/<code>predict()</code>.</li>
<li>Add <code>keras.ops.kaiser</code> function.</li>
<li>Add <code>keras.ops.hanning</code> function.</li>
<li>Add <code>keras.ops.cbrt</code> function.</li>
<li>Add <code>keras.ops.deg2rad</code> function.</li>
<li>Add <code>keras.ops.layer_normalization</code> function to leverage backend-specific performance optimizations.</li>
<li>Various bug fixes and performance optimizations.</li>
</ul>
<h2>Backend-specific changes</h2>
<h3>JAX backend</h3>
<ul>
<li>Support NNX library. It is now possible to use Keras layers and models as NNX modules.</li>
<li>Support shape -1 for slice op.</li>
</ul>
<h3>TensorFlow backend</h3>
<ul>
<li>Add support for multiple dynamic dimensions in <code>Flatten</code> layer.</li>
</ul>
<h3>OpenVINO backend</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="b491c860fc"><code>b491c86</code></a> Version bump to 3.11.3 (<a href="https://redirect.github.com/keras-team/keras/issues/21607">#21607</a>)</li>
<li><a href="251ac3422f"><code>251ac34</code></a> Version bump 3.11.2 and nnx fix <a href="https://redirect.github.com/keras-team/keras/issues/21565">#21565</a> (<a href="https://redirect.github.com/keras-team/keras/issues/21570">#21570</a>)</li>
<li><a href="0e11071e8e"><code>0e11071</code></a> Version bump 3.11.1 (<a href="https://redirect.github.com/keras-team/keras/issues/21535">#21535</a>)</li>
<li><a href="7bf852c211"><code>7bf852c</code></a> Update flax (<a href="https://redirect.github.com/keras-team/keras/issues/21527">#21527</a>)</li>
<li><a href="4085046b13"><code>4085046</code></a> [OpenVINO backend] fix openvino model exported names to match keras names (<a href="https://redirect.github.com/keras-team/keras/issues/2">#2</a>...</li>
<li><a href="6bc62031ad"><code>6bc6203</code></a> Fix a few typos in comments (<a href="https://redirect.github.com/keras-team/keras/issues/21525">#21525</a>)</li>
<li><a href="8bf6a58276"><code>8bf6a58</code></a> Add <code>VectorizedMap</code> op class. (<a href="https://redirect.github.com/keras-team/keras/issues/21516">#21516</a>)</li>
<li><a href="7cb0e48957"><code>7cb0e48</code></a> update python version (<a href="https://redirect.github.com/keras-team/keras/issues/21517">#21517</a>)</li>
<li><a href="7b9ab6a537"><code>7b9ab6a</code></a> Fix: UpSampling2D bilinear set_image_data_format(channels_first) bug (<a href="https://redirect.github.com/keras-team/keras/issues/21456">#21456</a>)</li>
<li><a href="90c8da6809"><code>90c8da6</code></a> Fix <code>_can_use_flash_attention</code>. (<a href="https://redirect.github.com/keras-team/keras/issues/21512">#21512</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/keras-team/keras/compare/v3.9.0...v3.11.3">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=keras&package-manager=pip&previous-version=3.9.0&new-version=3.11.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/openxla/xla/network/alerts).

</details>
Copybara import of the project:

--
103d4253e3cb9ef8885a36014359c4a437c465a6 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump keras in /xla/backends/cpu/benchmarks/e2e/gemma2/keras

Bumps [keras](https://github.com/keras-team/keras) from 3.9.0 to 3.11.3.
- [Release notes](https://github.com/keras-team/keras/releases)
- [Commits](https://github.com/keras-team/keras/compare/v3.9.0...v3.11.3)

---
updated-dependencies:
- dependency-name: keras
  dependency-version: 3.11.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #32968

PiperOrigin-RevId: 824516553
2025-10-27 07:57:21 -07:00
Eugene Zhulenev
5dfa57fd92 [xla:ffi] Use same id sequence for internal and external types
Add an API to lookup type id and info by type name. We can't rely on type ids for serialization, as they are not stable and assigned at run time depending on the type registration order. Type names on the other hand must be stable.

PiperOrigin-RevId: 824512487
2025-10-27 07:47:23 -07:00
A. Unique TensorFlower
0e809d4bc8 [XLA:GPU] Add simple multimem one-shot example.
PiperOrigin-RevId: 824508652
2025-10-27 07:34:24 -07:00
A. Unique TensorFlower
51d2e6931b Update pip dependency reference from @pypi_XXX//:pkg to @pypi//XXX.
PiperOrigin-RevId: 824505081
2025-10-27 07:21:22 -07:00
A. Unique TensorFlower
b15498a538 Add Symbolic/Affine convertor methods for IndexingMap
- Renamed and make public SymbolicToAffine to SymbolicExprToAffineExpr (needed for IndexingMap::GetConstraints)
- Renamed AffineToSymbolicExpr to AffineExprToSymbolicExpr
- Added AffineExprsToSymbolicExprs to convert a list of mlir::AffineExpr to a vector of xla::gpu::SymbolicExpr (needed for IndexingMap::ConstraintsSatisfied)

PiperOrigin-RevId: 824492246
2025-10-27 06:51:35 -07:00
Ilya Tikhonovskiy
aded8e05e0 [XLA:GPU] add buffer_nan_count_thunk for the buffer_nan_count_kernel
In the follow up cl we will need to add this thunk to the buffer debug pass.
Also there we will need to infer the buffer element type.
Another refactoring would be to change the name of the payload which is the checksum at the moment to something more generic like 'value' or 'result'.
One more thing we could do is to reduce the code duplication by merging together both thunks, the checksum one and nan counter one.

PiperOrigin-RevId: 824491914
2025-10-27 06:41:06 -07:00
Karlo Basioli
3f5b49f242 [XLA:CPU] Add target machine options to compilation result proto and check compilation arch when loading aot result.
Used to check if the runtime and compilation env are compatible.

PiperOrigin-RevId: 824481786
2025-10-27 06:05:08 -07:00
Thomas Joerg
e34b86def5 [XLA:GPU] Do not create transpose ops with non-default layout in DotDecomposer.
The `DotDecomposer` pass runs ahead of layout assignment. Introducing non-default layouts at this stage causes complications for subsequent passes, in particular the `DotMerger` pass.

PiperOrigin-RevId: 824476578
2025-10-27 05:54:10 -07:00
Dragan Mladjenovic
77bed2c6ef PR #32439: [ROCm] Fix and enable xla_gpu_use_embeded_device_lib and xla_gpu_use_…
Imported from GitHub PR https://github.com/openxla/xla/pull/32439

…inprocess_lld

📝 Summary of Changes
Enable embedded device libs and in-process lld by default.

🎯 Justification
Moves amdgpu backend to be more filesystem layout independent.

🚀 Kind of Contribution
🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
N\A

🧪 Unit Tests:
None

🧪 Execution Tests:
None

Copybara import of the project:

--
46a100377d00d30dbc79e34c977b9219c54bda4b by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>:

[ROCm] Fix and enable xla_gpu_use_embeded_device_lib and xla_gpu_use_inprocess_lld

Merging this change closes #32439

PiperOrigin-RevId: 824476138
2025-10-27 05:44:57 -07:00
Adrian Kuegel
4f15c0c9d3 [XLA:GPU] Choose a deterministic function name for nested computations
absl::Hash is not deterministic over different runs of the same program. Use
Fingerprint128 instead, and don't include the address of the computation.
PiperOrigin-RevId: 824460524
2025-10-27 04:53:10 -07:00
Alexander Grund
9ec8d8ece3 PR #31886: Fix libdevice search
Imported from GitHub PR https://github.com/openxla/xla/pull/31886

📝 Summary of Changes
This enhances the search for the CUDA libdevice path:
- Fix an invalid empty path added when `TF_CUDA_TOOLKIT_PATH` which may be empty
- Fix invalid paths based on runtime folders: `runfiles_dir.substr(0, runfiles_ind + runfiles_suffix.length())` is not meaningful when `runfiles_ind` isn't valid, i.e. `std::string::npos`
- Add `$CUDA_HOME` to the search paths. This is also used in TensorFlow already

🎯 Justification
Without this the libdevice file won't be found if CUDA isn't installed in a standard location or e.g. an updated version is available in a different location.
This is the case for e.g. HPC systems where multiple CUDA versions are available side-by-side.

🚀 Kind of Contribution
🐛 Bug Fix, ♻️ Cleanup

Fixes #28590

🧪 Unit Tests:

Simple test that when `CandidateCudaRoots` returns anything it contains `$CUDA_HOME`

Copybara import of the project:

--
01788b896900717ee916377a71d5c14963e0176d by Alexander Grund <alexander.grund@tu-dresden.de>:

Fix libdevice search when outside test environment

When there is no `runfiles_suffix` the `rfind` returns
`std::string::npos` which should be handled to not add meaningless paths.

--
900715a846102bacdfc7688f14713cbe6101506d by Alexander Grund <alexander.grund@tu-dresden.de>:

Use `$CUDA_HOME` when searching for libdevice.

With a CUDA installed to a non-default location XLA/TF fails with:
> gpu_backend_lib.cc:579] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
> Searched for CUDA in the following directories:
>   ./cuda_sdk_lib
>   /builddir/TensorFlow/TensorFlow-2.x_mnist-test.py.runfiles/cuda_nvcc
>   /buildi/cuda_nvcc
>
>   /usr/local/cuda
>   /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
>   /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
>   /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../cuda

Consider $CUDA_HOME as an additional location after the runfiles dirs (used for tests)

--
905d0596d199598036032f0f84b4487e9afd2bef by Alexander Grund <alexander.grund@tu-dresden.de>:

Don't add empty TF_CUDA_TOOLKIT_PATH to libdevice search

At least in some environments that define is the empty string which
doesn't make sense to add to the search paths.
Add a check for that.

--
23eb59bfabd570caabf0b9ec3515233f46a4fae7 by Alexander Grund <alexander.grund@tu-dresden.de>:

Add test for $CUDA_HOME in CandidateCudaRoots

--
a8c215bc222b4ba8581f2f44549613ebd59b9cbb by Alexander Grund <alexander.grund@tu-dresden.de>:

Add braces to loops/conditions

--
39efc67f8b1d44e131f993c8040b7eb69ff52f0c by Alexander Grund <alexander.grund@tu-dresden.de>:

Use kIsOpenSource in skip condition

Merging this change closes #31886

PiperOrigin-RevId: 824450284
2025-10-27 04:10:14 -07:00
Will Froom
4d623afca2 [XLA][XTile] Make transpose folder work with xtile extract.
PiperOrigin-RevId: 824439434
2025-10-27 03:39:25 -07:00
Henning Becker
76a084f181 Move Attribute types from call_frame.cc into attribute_map.cc
This is moving `Scalar`, `Array`, `Dictionary`, `FlatAttribute`, `FlatAttributeMap`, and `AttributeMap` from `CallFrameBuilder` into the `xla::ffi` namespace.

It also moves the code into `attribute_map.{cc|h}`.

All these types are basically aliases from some kind of `std::variant` type. This change is a preparation for making them proper types and add `ToProto` and `FromProto` methods.

PiperOrigin-RevId: 824435281
2025-10-27 03:22:14 -07:00
Ilya Tikhonovskiy
78a0ca0b60 [XLA:GPU] add nan count cuda kernel
The kernel is similar to the one for the checksum calculation

PiperOrigin-RevId: 824428856
2025-10-27 03:08:29 -07:00
Will Froom
dfeccf211b [XLA:CPU][XTile] Implement pass to rewrite dynamic vector extracts to static.
PiperOrigin-RevId: 824427163
2025-10-27 02:56:12 -07:00
Junwhan Ahn
9add8b7e61 Fix a bug where S2/U2 dtypes were missing proto conversion
Also fixed the round trip test to not ignore `kInvalid` returned from proto conversion, which is why we didn't catch this bug.

PiperOrigin-RevId: 824419619
2025-10-27 02:41:11 -07:00
Adrian Kuegel
5d42d91467 Make sure to produce a deterministic Memory usage report.
PiperOrigin-RevId: 824403476
2025-10-27 01:41:57 -07:00
Eugene Zhulenev
3630944d0f [xla:ffi] Add support for binding Context object to the handler
PiperOrigin-RevId: 824278531
2025-10-26 16:48:26 -07:00
Michael Kuperstein
2de4be94aa [XLA] Remove HLO unstacker.
The pass is not used.

PiperOrigin-RevId: 824274493
2025-10-26 16:31:48 -07:00
Eugene Zhulenev
cc5ee2577c [xla:ffi] Add support for std::variant<> attributes decoding
PiperOrigin-RevId: 824272994
2025-10-26 16:24:59 -07:00
A. Unique TensorFlower
32c1551f24 Add support for int8 dots, and allow bf16 to be used on any CPU.
PiperOrigin-RevId: 824272399
2025-10-26 16:13:53 -07:00
Eugene Zhulenev
5edcd28152 [xla:cpu:ynn] Do not track work stealing workers
```
name                                                               cpu/op         cpu/op      vs base
BM_ParallelFor/8/1/process_time   [#threads=8, #threadpools=1  ]    5.470m ±  5%   5.095m ± 3%  -6.87% (p=0.000 n=80)
BM_ParallelFor/8/2/process_time   [#threads=8, #threadpools=2  ]    2.857m ±  1%   2.595m ± 2%  -9.15% (n=80)
BM_ParallelFor/8/4/process_time   [#threads=8, #threadpools=4  ]    1.447m ± 10%   1.328m ± 1%  -8.23% (p=0.000 n=80)
BM_ParallelFor/8/8/process_time   [#threads=8, #threadpools=8  ]   1058.1µ ± 20%   974.5µ ± 1%  -7.90% (p=0.000 n=80)
BM_ParallelFor/8/16/process_time  [#threads=8, #threadpools=16 ]    741.5µ ± 26%   705.8µ ± 1%  -4.81% (p=0.000 n=80)
BM_ParallelFor/16/1/process_time  [#threads=16, #threadpools=1 ]    9.796m ± 29%   9.972m ± 2%       ~ (p=0.312 n=80)
BM_ParallelFor/16/2/process_time  [#threads=16, #threadpools=2 ]    7.871m ± 28%   7.706m ± 1%  -2.10% (p=0.030 n=80)
BM_ParallelFor/16/4/process_time  [#threads=16, #threadpools=4 ]    4.330m ±  2%   4.157m ± 1%  -3.99% (p=0.000 n=80)
BM_ParallelFor/16/8/process_time  [#threads=16, #threadpools=8 ]    2.678m ±  2%   2.638m ± 1%  -1.49% (p=0.014 n=80)
BM_ParallelFor/16/16/process_time [#threads=16, #threadpools=16]    1.791m ±  1%   1.807m ± 1%       ~ (p=0.325 n=80)
BM_ParallelFor/32/1/process_time  [#threads=32, #threadpools=1 ]    15.33m ±  1%   15.41m ± 1%       ~ (p=0.215 n=80)
BM_ParallelFor/32/2/process_time  [#threads=32, #threadpools=2 ]    13.99m ±  1%   13.80m ± 2%       ~ (p=0.400 n=80)
BM_ParallelFor/32/4/process_time  [#threads=32, #threadpools=4 ]    9.415m ±  1%   9.172m ± 1%  -2.58% (p=0.000 n=80)
BM_ParallelFor/32/8/process_time  [#threads=32, #threadpools=8 ]    5.759m ±  1%   5.647m ± 1%  -1.95% (p=0.004 n=80)
BM_ParallelFor/32/16/process_time [#threads=32, #threadpools=16]    3.932m ±  1%   3.864m ± 1%  -1.72% (p=0.006 n=80)
geomean                                                            4.051m         3.916m       -3.32%

name                                                               time/op        time/op     vs base
BM_ParallelFor/8/1/process_time   [#threads=8, #threadpools=1  ]    651.2µ ±  3%   600.3µ ± 4%  -7.80% (p=0.000 n=80)
BM_ParallelFor/8/2/process_time   [#threads=8, #threadpools=2  ]    329.4µ ±  0%   298.6µ ± 2%  -9.35% (n=80)
BM_ParallelFor/8/4/process_time   [#threads=8, #threadpools=4  ]    169.3µ ± 12%   155.7µ ± 1%  -8.05% (p=0.000 n=80)
BM_ParallelFor/8/8/process_time   [#threads=8, #threadpools=8  ]    125.8µ ± 21%   115.7µ ± 1%  -8.08% (p=0.000 n=80)
BM_ParallelFor/8/16/process_time  [#threads=8, #threadpools=16 ]    95.41µ ± 24%   89.56µ ± 1%  -6.13% (p=0.000 n=80)
BM_ParallelFor/16/1/process_time  [#threads=16, #threadpools=1 ]   1015.8µ ±  1%   952.0µ ± 1%  -6.29% (n=80)
BM_ParallelFor/16/2/process_time  [#threads=16, #threadpools=2 ]    556.5µ ±  1%   522.6µ ± 1%  -6.09% (n=80)
BM_ParallelFor/16/4/process_time  [#threads=16, #threadpools=4 ]    289.7µ ±  2%   274.4µ ± 1%  -5.30% (p=0.000 n=80)
BM_ParallelFor/16/8/process_time  [#threads=16, #threadpools=8 ]    178.8µ ±  2%   174.1µ ± 1%  -2.59% (p=0.000 n=80)
BM_ParallelFor/16/16/process_time [#threads=16, #threadpools=16]    123.9µ ±  2%   123.0µ ± 1%       ~ (p=0.098 n=80)
BM_ParallelFor/32/1/process_time  [#threads=32, #threadpools=1 ]    1.526m ±  3%   1.433m ± 3%  -6.07% (p=0.000 n=80)
BM_ParallelFor/32/2/process_time  [#threads=32, #threadpools=2 ]    835.2µ ±  2%   783.5µ ± 2%  -6.19% (p=0.000 n=80)
BM_ParallelFor/32/4/process_time  [#threads=32, #threadpools=4 ]    471.6µ ±  2%   455.1µ ± 1%  -3.52% (p=0.000 n=80)
BM_ParallelFor/32/8/process_time  [#threads=32, #threadpools=8 ]    296.1µ ±  2%   287.0µ ± 2%  -3.08% (p=0.000 n=80)
BM_ParallelFor/32/16/process_time [#threads=32, #threadpools=16]    215.0µ ±  2%   211.6µ ± 1%  -1.59% (p=0.018 n=80)
geomean                                                            330.2µ         312.3µ       -5.42%
```

PiperOrigin-RevId: 824259124
2025-10-26 15:16:20 -07:00
Eugene Zhulenev
e65144c31f [xla:ffi] Check that Type used as a state is registered before the handler
PiperOrigin-RevId: 824258481
2025-10-26 15:06:40 -07:00
Eugene Zhulenev
87e3b84514 [tsl:concurrency] In Promise replace IsUnique() with NumRef() == 1
The meaning of AsyncValue::IsUnique() is fuzzy for the chain of indirect async values. Prefer simpler check for uniqueness in Future/Promise library.

Also update AsyncValue::IsUnique() documentation.

PiperOrigin-RevId: 824256830
2025-10-26 14:48:28 -07:00
Eugene Zhulenev
72d04ced58 [xla:cpu] Correctly measure CPU time in slinky thread pool benchmark
PiperOrigin-RevId: 824253351
2025-10-26 14:23:48 -07:00
Ivo Ristovski List
f3689e1314 Automated Code Change
PiperOrigin-RevId: 824094790
2025-10-26 00:36:30 -07:00
A. Unique TensorFlower
0338b08bee Add mechanism to prioritize ForceDelay custom calls
PiperOrigin-RevId: 823973702
2025-10-25 14:20:20 -07:00
A. Unique TensorFlower
6bede44c1a Integrate LLVM at llvm/llvm-project@621ed04e28
Updates LLVM usage to match
[621ed04e2878](https://github.com/llvm/llvm-project/commit/621ed04e2878)

PiperOrigin-RevId: 823941203
2025-10-25 11:22:30 -07:00
Christian Sigg
c50123703d Increment XLA GPU autotune cache version to 15.
This change invalidates the autotune cache, which is necessary because enabling the generic emitter (cl/823475406) affected autotuning results.

PiperOrigin-RevId: 823818338
2025-10-25 00:26:42 -07:00
Alexander Shaposhnikov
171247d500 Temporarily bring back the old logic for capturing RHS.
PiperOrigin-RevId: 823712382
2025-10-24 17:00:45 -07:00
Abhinav Gunjal
c7b4a8e3a5 Automated Code Change
Reverts 1b838a947b

PiperOrigin-RevId: 823696235
2025-10-24 16:11:34 -07:00
A. Unique TensorFlower
05c94a96e4 Integrate LLVM at llvm/llvm-project@704240125d
Updates LLVM usage to match
[704240125ddf](https://github.com/llvm/llvm-project/commit/704240125ddf)

PiperOrigin-RevId: 823662883
2025-10-24 14:26:34 -07:00
A. Unique TensorFlower
42d764666d Integrate LLVM at llvm/llvm-project@917d1f20ae
Updates LLVM usage to match
[917d1f20aecf](https://github.com/llvm/llvm-project/commit/917d1f20aecf)

PiperOrigin-RevId: 823542980
2025-10-24 08:50:00 -07:00
A. Unique TensorFlower
69c93c6f6a Reshard on call output if sharding mismatches with the func result.
It is no-op behaviorally for shardy. Because the call output and func result may mismatch only if dedup-functions-fully options is true, and this option is false by default.

Shardy will add explicit reshards (during shardy partitioner) on those operations that use the output of named computation and it will do so assuming the sharding of the named computation is sharded as specified in the out shardings of the named computation.

When dedup-functions-fully option is true, however, the function that is actually called may end up having a different output sharding than the corresponding named computation. So, the users of the output shardings should still use sharding as in the output shardings the named computation. Hence, if there is a mismatch between the output sharding of the named computation and the result sharding of the function, we add a reshard on the output of the call.

PiperOrigin-RevId: 823494391
2025-10-24 05:59:15 -07:00
Ilya Tikhonovskiy
0c0947cea6 [XLA:GPU] Initialize PrecisionConfig for ScaledDot in composite rewriter.
Explicitly set the operand precisions to `PrecisionConfig::DEFAULT` when creating a `ScaledDot` instruction from a composite call.

PiperOrigin-RevId: 823488638
2025-10-24 05:38:20 -07:00
Eugene Zhulenev
a5fca6a9b5 [tsl:concurrency] Do not use executor is detached future is unused
+ use `ptr` when using `AsPtr()` for consistency
+ rename `Wrap` to `AndThen` as it's more meaningful and makes profiles readable

PiperOrigin-RevId: 823476695
2025-10-24 04:55:18 -07:00
Christian Sigg
c8cc7f2fbb [XLA:GPU] Enable generic triton emitter for all gemms, second attempt.
According to benchmarks we have reached the neutrality with the legacy emitter. Switching to the new emitter by default. Legacy emitter will be kept for some time but is considered depricated and should not be used. It will be deleted in the near future.

Reverts 85c99b1ecb

PiperOrigin-RevId: 823475406
2025-10-24 04:46:17 -07:00
Benjamin Chetioui
acf7f31c31 [XLA:GPU] Fix index of operand in call to GetNonContractingDims.
PiperOrigin-RevId: 823358506
2025-10-23 22:57:57 -07:00
A. Unique TensorFlower
cbbed7a2fd Automated Code Change
PiperOrigin-RevId: 823350718
2025-10-23 22:40:22 -07:00
Eugene Zhulenev
ef326c74ef [xla:cpu] Add benchmarks for SlinkyThreadPool
```
BM_ParallelFor/8/1      364687 ns       228963 ns         2974 items_per_second=43.6752M/s #threads=8, #threadpools=1
BM_ParallelFor/8/2      226687 ns       176171 ns         2877 items_per_second=56.763M/s #threads=8, #threadpools=2
BM_ParallelFor/8/4      211589 ns       184345 ns         5816 items_per_second=54.2462M/s #threads=8, #threadpools=4
BM_ParallelFor/8/8      177793 ns       162265 ns         3788 items_per_second=61.6275M/s #threads=8, #threadpools=8
BM_ParallelFor/8/16     206898 ns       192792 ns         3339 items_per_second=51.8693M/s #threads=8, #threadpools=16
```

PiperOrigin-RevId: 823321692
2025-10-23 21:14:28 -07:00
Alexander Shaposhnikov
3be9a21d7e Add initial support for offloading dots to YNNPACK.
PiperOrigin-RevId: 823318539
2025-10-23 21:04:45 -07:00
Zac Mustin
5893a54e81 Add PJRT c sandwich benchmarks to nanort benchmarks.
PiperOrigin-RevId: 823259666
2025-10-23 18:15:39 -07:00
Benjamin Chetioui
4ed3ee15e7 [XLA:GPU] Allow simplifying some dot point dimensions in SymbolicTileAnalysis.
Previously, we would never allow simplification when encountering a `dot`
instruction. But this constraint was overly conservative; the only dimensions
that we shouldn't simplify are those along which we intend to perform
non-standard padding to fit to hardware restrictions, i.e. the non-contracting
and contracting dimensions.

Restricting this pattern further works around a bug whereby expanding a
non-standardly padded dimension into a `1` dim can result in propagating a
tile with the wrong size.

The underlying reason for this is a bug in the `kPreserve` behaviour of
`IndexingMap` simplification, which will need to be fixed separately (the new
tiling should avoid this issue, since it shouldn't rely on the correctness of
`IndexingMap` simplification at this level).

PiperOrigin-RevId: 823258725
2025-10-23 18:03:30 -07:00
A. Unique TensorFlower
ce800f5880 Re-enable testing of the thread pool in YnnFusionThunkTest
This was disabled when we didn't have the thread pool available, but now we do.

PiperOrigin-RevId: 823247913
2025-10-23 17:27:12 -07:00
Matthias Guenther
1b838a947b Enable Stablehlo -> HLO lowering by default.
Note that, in order to maintain parity with MHLO optimizations, this enables the `assume-no-undeclared-side-effects` option. This matches the default behavior for MHLO, but StableHLO is more cautious by default. Empirically, past evidence suggests it's pretty safe given that MHLO has been doing it all this time. Disabling the flag can result in significantly larger HLO after lowering, so we enable it here.

PiperOrigin-RevId: 823234079
2025-10-23 16:43:13 -07:00
A. Unique TensorFlower
774bc48035 Update XNNPACK in XLA
PiperOrigin-RevId: 823199731
2025-10-23 15:24:59 -07:00
Maxim Ermilov
9ee1d967e1 initialize nvml in CudaPlatform
PiperOrigin-RevId: 823193773
2025-10-23 15:03:33 -07:00