Majid Dadashi
2feb74eeff
Add i4 support in tfl.slice
...
PiperOrigin-RevId: 825217744
2025-10-28 15:27:41 -07:00
Haibo Huang
202bd1ac59
Remove IsTpuTopology and IsGpuTopology and IsCpuTopology
...
PiperOrigin-RevId: 825216555
2025-10-28 15:17:52 -07:00
Matt Hurd
803a513588
Allow specifying a custom session id instead of always using timestamp.
...
This change introduces a `session_id` option in `ProfileOptions` and `RemoteProfilerSessionManagerOptions`. When provided, this id will be used as the subdirectory for storing profile data, instead of the default timestamp.
PiperOrigin-RevId: 825216123
2025-10-28 15:10:15 -07:00
Parker Schuh
fef8806609
Remove unnecessary limit.
...
PiperOrigin-RevId: 825212681
2025-10-28 14:54:58 -07:00
Antonio Sanchez
b531e70088
Updating internal visibility rules.
...
PiperOrigin-RevId: 825202091
2025-10-28 14:26:07 -07:00
Victor Stone
63a9d0d1f8
If device placement annotations are found inside host computations (as a result of nested host computations), hoist them up the call stack. If any unsupported cases or inconsistencies are detected, an error will be returned to the user.
...
This allows JAX's migration from their previous `compute_on` API to the new (currently named `compute_on2`) API.
PiperOrigin-RevId: 825177029
2025-10-28 13:24:57 -07:00
A. Unique TensorFlower
768e653c9c
Integrate LLVM at llvm/llvm-project@29c830cbf8
...
Updates LLVM usage to match
[29c830cbf8c6](https://github.com/llvm/llvm-project/commit/29c830cbf8c6 )
PiperOrigin-RevId: 825166312
2025-10-28 13:03:12 -07:00
Peter Gavin
5c13ee5063
Internal build rule change.
...
PiperOrigin-RevId: 825151475
2025-10-28 12:22:01 -07:00
A. Unique TensorFlower
98c995ee9a
Remove unnecessary native.bind calls from workspace0.bzl.
...
These bindings for gRPC, Python headers, and six are no longer required.
PiperOrigin-RevId: 825146137
2025-10-28 12:09:32 -07:00
Eugene Zhulenev
44446df9cf
[xla:cpu] Remove mlir/llvm kernel_definition and kernel_emitter libraries
...
PiperOrigin-RevId: 825133247
2025-10-28 11:46:01 -07:00
Bill Varcho
63d558e46f
[ReplicaGroupV3][Mesh + AxesRef] add to/from proto functions + equality op to XLA definitions of Mesh and AxesRef.
...
PiperOrigin-RevId: 825107889
2025-10-28 11:18:27 -07:00
Henning Becker
4dfbd3bd0c
Add proto serialization for RocmComputeCapability
...
PiperOrigin-RevId: 825103988
2025-10-28 11:08:48 -07:00
Eugene Zhulenev
fa61547732
[xla:cpu] Rename LlvmIrKernelSource to LlvmKernelSource
...
PiperOrigin-RevId: 825103892
2025-10-28 10:59:06 -07:00
Will Froom
2ef85038ed
[XLA:CPU] Measure process CPU time in reduction benchmark.
...
PiperOrigin-RevId: 825102754
2025-10-28 10:44:13 -07:00
Henning Becker
c7dd4775b7
Add missing dependency to TSL target
...
Without that the layering check was failing.
PiperOrigin-RevId: 825098439
2025-10-28 10:30:46 -07:00
Ilya Tikhonovskiy
14db6f6317
[XLA:GPU] follow up fix after pr#32919
...
Change `gpu_version` parameter to const reference in `IntelGpuCompiler`.
This aligns the parameter type in `OptimizeHloConvolutionCanonicalization` with the base class signature.
PiperOrigin-RevId: 825083863
2025-10-28 10:12:54 -07:00
Eugene Zhulenev
d75ad2c4ff
[xla:cpu] Cleanup KernelSpec to use absl::string_view and absl::Span
...
PiperOrigin-RevId: 825075908
2025-10-28 09:56:20 -07:00
A. Unique TensorFlower
10fd9cfebb
Iterate on the functions of the map in a deterministic order to create function names deterministically.
...
When a function has multiple instances with different manual axes, and dedup-functions-fully is on, it will have different copies of the same function.
For example:
sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.named_computation<"foo">(%arg0) (%arg1: tensor<8xf32>) {}
----->
sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
call @foo(%arg1)
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
call @foo_0(%arg1)
}
call @foo_1(%arg0)
The order of the iteration on the map/vector determines the which 'foo' will become 'foo_0', 'foo_1', or stay as 'foo'.
PiperOrigin-RevId: 825074314
2025-10-28 09:49:24 -07:00
Dimitris Vardoulakis
0ad542ea89
PR #33117 : Rename "forward compatible" capabilities to "family compatible", per NVIDIA naming.
...
Imported from GitHub PR https://github.com/openxla/xla/pull/33117
See:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
Family compatible supported was introduced in 16b9d957ff .
It's not clear to me how someone can actually configure XLA to use sm_100f. Will look into that next.
Copybara import of the project:
--
331d40c9c93ffb3a5c97e53e4017f604aa23d221 by Dimitris Vardoulakis <dvardoulakis@nvidia.com>:
Rename "forward compatible" capabilities to "family compatible",
per NVIDIA naming.
See:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
Merging this change closes #33117
PiperOrigin-RevId: 825073858
2025-10-28 09:38:56 -07:00
Niklas Vangerow
4aabddab2d
Migrate conv_depthwise_test to use PjRt.
...
PiperOrigin-RevId: 825064898
2025-10-28 09:32:28 -07:00
Jian Cai
7c6d13443d
[XLA] Add a member function to check if a tuple tree has any tuples
...
The function returns true if a tuple has only a root node.
PiperOrigin-RevId: 825062842
2025-10-28 09:22:15 -07:00
Shaogang Wang
d1ca03b626
PR #33149 : [XLA:GPU] add VLOG dump option that only prints out primary command buffer graph.
...
Imported from GitHub PR https://github.com/openxla/xla/pull/33149
📝 Summary of Changes
add cuda graph dump option that only prints out primary graph, so not to flush the screen log with nested cuda graph.
🎯 Justification
Easy debug read
🚀 Kind of Contribution
Please remove what does not apply📚 Documentation
Copybara import of the project:
--
18d6939170fd5bf4fa9228d4f74ca3ff4e83ec17 by Shawn Wang <shawnw@nvidia.com>:
add cuda graph dump option that only prints out primary graph
Merging this change closes #33149
PiperOrigin-RevId: 825049186
2025-10-28 09:03:40 -07:00
A. Unique TensorFlower
da8b9bf004
[Autotuner] Add (de)serialization from/to string in cache.
...
- This is required to port sharding from gemm_fusion_autotuner.
PiperOrigin-RevId: 825045654
2025-10-28 08:35:56 -07:00
Ilya Tikhonovskiy
ffca28bcf8
[XLA:GPU] Ignore reductions over dimensions of size 1 in UnstableReductionDetect
...
The UnstableReductionDetector now considers reductions where all reduced dimensions have a size of 1 to be stable, as these operations are effectively no-ops and do not introduce numerical instability. A test case is added to verify this behavior.
PiperOrigin-RevId: 825045042
2025-10-28 08:27:58 -07:00
Marcin Radomski
7334d07917
[XLA:GPU] Add check_thunk_result_consistency tool for verifying checksum consistency
...
When implementing this it turned out that the log is currently missing some information needed to reliably distinguish input/output checksums and different thunk executions. This adds the needed fields to the proto, but emitting them in the log will be a separate change.
With the extra data missing, the tool assumes all checksums refer to outputs, and each thunk execution is going to give the same results each time. The tests include the extra data, so once that's implement it should(TM) just work.
PiperOrigin-RevId: 825040798
2025-10-28 08:20:28 -07:00
Sohaib Iftikhar
542ffe0410
[XLA:GPU]: Add peer to peer copies for cupti tracing.
...
Before this change peer to peer copies done using
cuMemcpyPeerAsync was not being tracked with the driver API.
PiperOrigin-RevId: 825040246
2025-10-28 08:08:56 -07:00
Benjamin Chetioui
034d750525
[XLA][NFC] Remove line saying that cudaGetLastError is incompatible with command buffers.
...
It turns out it is (although the broader point still stands).
PiperOrigin-RevId: 825036231
2025-10-28 07:57:07 -07:00
Will Froom
de7a63363c
[XLA:CPU][XTile] Implement vectorized reduce.
...
PiperOrigin-RevId: 825027697
2025-10-28 07:29:37 -07:00
Quentin Khan
11c00ca2db
If --use_xnnpack is specified, never use the default delegate in benchmark_tflite_model.
...
**Without this change**, when using `--use_xnnpack`, either:
1. `--use_xnnpack=true`: the **default resolver** (that automatically applies an XNNPack delegate) is used and an XNNPack delegate that follows the options that are given on the command line is explicitly applied.
2. `--use_xnnpack=false`: the **resolver without the default XNNPack delegate** is used and no delegate is explicitly applied, i.e. no delegate is applied.
3. No `--use_xnnpack` is specified: the **default resolver** (that automatically applies an XNNPack delegate) is used.
Case 1 has issues because the custom and default delegates are applied and
these may interfere during the initialization.
- Depending on the XNNPack options some operations may be delegated or not.
- This leads to one or the other delegate to take the ops.
- This makes the benchmarking of initialization completely wrong since two
delegates are applied.
- This messes up with the XNNPack weight cache since it can never be enabled
for the default delegate.
To solve this, the new behaviour is:
1. `--use_xnnpack=true`: the **resolver without the default XNNPack delegate**
is used **and** an XNNPack delegate that follows the options that are given on the command line is explicitly applied.
2. `--use_xnnpack=false`: the **resolver without the default XNNPack delegate** is used and no delegate is explicitly applied, i.e. no delegate is applied.
3. No `--use_xnnpack` is specified: the **default resolver** (that automatically applies an XNNPack delegate) is used.
Cases 2 and 3 are not affected by this change.
PiperOrigin-RevId: 825018995
2025-10-28 07:09:35 -07:00
dependabot[bot]
97f4e08c24
PR #33140 : Bump actions/upload-artifact from 4.6.1 to 5.0.0
...
Imported from GitHub PR https://github.com/openxla/xla/pull/33140
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact ) from 4.6.1 to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases ">actions/upload-artifact's releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<p><strong>BREAKING CHANGE:</strong> this update supports Node <code>v24.x</code>. This is not a breaking change per-se but we're treating it as such.</p>
<ul>
<li>Update README.md by <a href="https://github.com/GhadimiR "><code>@GhadimiR</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/681 ">actions/upload-artifact#681</a></li>
<li>Update README.md by <a href="https://github.com/nebuk89 "><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/712 ">actions/upload-artifact#712</a></li>
<li>Readme: spell out the first use of GHES by <a href="https://github.com/danwkennedy "><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/727 ">actions/upload-artifact#727</a></li>
<li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak "><code>@patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/725 ">actions/upload-artifact#725</a></li>
<li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li>
<li>Prepare <code>v5.0.0</code> by <a href="https://github.com/danwkennedy "><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/734 ">actions/upload-artifact#734</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GhadimiR "><code>@GhadimiR</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/681 ">actions/upload-artifact#681</a></li>
<li><a href="https://github.com/nebuk89 "><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/712 ">actions/upload-artifact#712</a></li>
<li><a href="https://github.com/danwkennedy "><code>@danwkennedy</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/727 ">actions/upload-artifact#727</a></li>
<li><a href="https://github.com/patrikpolyak "><code>@patrikpolyak</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/725 ">actions/upload-artifact#725</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v5.0.0 ">https://github.com/actions/upload-artifact/compare/v4...v5.0.0 </a></p>
<h2>v4.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Update to use artifact 2.3.2 package & prepare for new upload-artifact release by <a href="https://github.com/salmanmkc "><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/685 ">actions/upload-artifact#685</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc "><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/685 ">actions/upload-artifact#685</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v4.6.2 ">https://github.com/actions/upload-artifact/compare/v4...v4.6.2 </a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="330a01c490 "><code>330a01c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/734 ">#734</a> from actions/danwkennedy/prepare-5.0.0</li>
<li><a href="03f2824452 "><code>03f2824</code></a> Update <code>github.dep.yml</code></li>
<li><a href="905a1ecb59 "><code>905a1ec</code></a> Prepare <code>v5.0.0</code></li>
<li><a href="2d9f9cdfa9 "><code>2d9f9cd</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/725 ">#725</a> from patrikpolyak/patch-1</li>
<li><a href="9687587dec "><code>9687587</code></a> Merge branch 'main' into patch-1</li>
<li><a href="2848b2cda0 "><code>2848b2c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/727 ">#727</a> from danwkennedy/patch-1</li>
<li><a href="9b511775fd "><code>9b51177</code></a> Spell out the first use of GHES</li>
<li><a href="cd231ca1ed "><code>cd231ca</code></a> Update GHES guidance to include reference to Node 20 version</li>
<li><a href="de65e23aa2 "><code>de65e23</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/712 ">#712</a> from actions/nebuk89-patch-1</li>
<li><a href="8747d8cd76 "><code>8747d8c</code></a> Update README.md</li>
<li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4 ">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores )
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Copybara import of the project:
--
5eab24c4d57708cbb45b476265bca2e841706647 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:
Bump actions/upload-artifact from 4.6.1 to 5.0.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact ) from 4.6.1 to 5.0.0.
- [Release notes](https://github.com/actions/upload-artifact/releases )
- [Commits](https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4 )
---
updated-dependencies:
- dependency-name: actions/upload-artifact
dependency-version: 5.0.0
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Merging this change closes #33140
PiperOrigin-RevId: 824999008
2025-10-28 06:14:44 -07:00
Adrian Kuegel
a12d2cfb31
[XLA:GPU] Make ReductionEmitter deterministic.
...
So far, the output could be non-deterministic if multiple reductions are
grouped together. This change makes it deterministic.
PiperOrigin-RevId: 824965037
2025-10-28 04:32:11 -07:00
Adrian Kuegel
29bc205be3
Update the Shardy pin in XLA
...
This should resolve shardy related test failures.
PiperOrigin-RevId: 824944776
2025-10-28 03:26:51 -07:00
A. Unique TensorFlower
4d3b9fb509
Update GraphDef version to 2394.
...
PiperOrigin-RevId: 824918460
2025-10-28 02:24:51 -07:00
A. Unique TensorFlower
9bd524b777
compat: Update forward compatibility horizon to 2025-10-28
...
PiperOrigin-RevId: 824918400
2025-10-28 02:15:51 -07:00
A. Unique TensorFlower
e937dcc97c
Automated Code Change
...
PiperOrigin-RevId: 824904755
2025-10-28 01:35:51 -07:00
A. Unique TensorFlower
c45fcf1b1c
Automated Code Change
...
PiperOrigin-RevId: 824851950
2025-10-27 23:03:01 -07:00
A. Unique TensorFlower
c6d737900f
Automated Code Change
...
PiperOrigin-RevId: 824847154
2025-10-27 22:48:55 -07:00
Zixuan Jiang
85d834e07b
Add PrintArray and ArrayToString methods to IotaTileAssignment and TileAssignment.
...
These new methods allow printing or converting to a string only the array representation of the tile assignment, without including the tile dimensions. The existing `Print` and `ToString` methods are updated to use these new array-specific printing functions.
PiperOrigin-RevId: 824816702
2025-10-27 21:19:25 -07:00
Zixuan Jiang
402ead44b2
The compatible factor shardings should not have overlap between axes across different tensors.
...
PiperOrigin-RevId: 824815687
2025-10-27 21:09:47 -07:00
Eugene Zhulenev
c09d68c588
[xla:ffi] Remove unused context decoding for C API internals
...
PiperOrigin-RevId: 824792000
2025-10-27 19:59:55 -07:00
Michael Kuperstein
b6f66e3e01
[XLA] VLOG instruction count before each HLO pass.
...
PiperOrigin-RevId: 824791349
2025-10-27 19:50:20 -07:00
A. Unique TensorFlower
4231383b5b
Integrate LLVM at llvm/llvm-project@d0a7411cb8
...
Updates LLVM usage to match
[d0a7411cb840](https://github.com/llvm/llvm-project/commit/d0a7411cb840 )
PiperOrigin-RevId: 824767534
2025-10-27 18:28:40 -07:00
A. Unique TensorFlower
6e82d4d96b
Add call to ynn_optimize_subgraph
...
This currently happens implicitly in `ynn_create_runtime`, but that will not be the case soon. (Calling it multiple times is harmless.)
PiperOrigin-RevId: 824754921
2025-10-27 17:53:49 -07:00
Parker Schuh
1fc47fae8e
Transition to an error state more aggressively when the socket reports errors.
...
PiperOrigin-RevId: 824749937
2025-10-27 17:40:30 -07:00
A. Unique TensorFlower
582aa05a79
Add TensorTypeGetSize to schema_utils.
...
PiperOrigin-RevId: 824743105
2025-10-27 17:30:05 -07:00
Maxim Ermilov
699879f5f3
add pcie_bandwidth field to DeviceDescription
...
PiperOrigin-RevId: 824738638
2025-10-27 17:17:00 -07:00
Eugene Zhulenev
769acdd784
[xla] Migrate Tensorflow and XLA to xla::Future
...
Cleanup BUILD files and fix header includes in preparation for pjrt_future removal.
PiperOrigin-RevId: 824733023
2025-10-27 17:06:39 -07:00
A. Unique TensorFlower
633e1931cd
Add size_t casts to memcpy size calculation in BroadcastTo.
...
Explicitly cast the operands of the size calculation to `size_t` to prevent potential integer overflow before calling `memcpy` under 64bit system.
PiperOrigin-RevId: 824732102
2025-10-27 16:54:11 -07:00
Tori Baker
d299463d26
[xla:gpu] Fix our convert integer to pred in our Triton emitter
...
`arith.trunci` for i1 will simply take the last bit, but HLO expects convert to i1 to be value != 0. Emit this conversion a a compare not equal to 0 instead. This is already done correctly for floats.
PiperOrigin-RevId: 824716165
2025-10-27 16:14:51 -07:00
Matt Hurd
ffc21f066a
Add session_id to profiler_options
...
This will allow a follow-up PR that allows utilizing this proto.
PiperOrigin-RevId: 824709715
2025-10-27 15:56:14 -07:00