Commit Graph

186058 Commits

Author SHA1 Message Date
Gunhyun Park
d2759cb154 Fix typo invompatible -> incompatible
PiperOrigin-RevId: 821816332
2025-10-20 15:16:51 -07:00
Sean Talts
8d940b9cc7 [XLA:CPU] Refactor: intrinsic::Type into its own library, pull out naming functions.
PiperOrigin-RevId: 821815511
2025-10-20 15:06:06 -07:00
Hyeontaek Lim
b915d3103a [NanoRt] NanoRt IFRT now returns a nullptr if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before.
Future note: At some point, `NanoArray` would need to distinguish between a default layout vs. a concrete layout that is equal to the default layout. If the latter is used, `NanoArray::pjrt_layout()` is expected to return the concrete layout. This is not required by IFRT API semantics yet, but it will be enforced later in the future.

PiperOrigin-RevId: 821808592
2025-10-20 14:51:22 -07:00
Quentin Khan
dc12ec4556 Don't name the handle when mapping the XNNPack cache on Windows.
When specifying a mapping name to `CreateFileMappingA()`, that function returns
previous mappings that match the same name disregarding the newly requested
mapping size. This doesn't work well with the weight cache that is built (and
mapped) incrementally.

By making the mapping objects anonymous, we ensure that the mapping returned
will have the requested size.

Note: this doesn't increase the totally memory used by the process but the
accounting by the Windows system is different. Compared to a fix that allocates
memory instead of mapping the file, less memory is committed, and private and
more is shareable.

Testing `litert_llm_main` on [Gemma3-1B-IT] on Windows 11.

| Fix        | Commit (KB) | Working Set (KB) | Shareable (KB) | Private (KB) |
| ---------: | -----------:| ----------------:| --------------:| ------------:|
| Anon. map  |   1 208 416 |        1 678 396 |      1 079 620 |      599 096 |
| Mem. alloc |   1 705 620 |        1 678 572 |        582 428 |    1 096 144 |
|            |             |                  |                |              |
| diff.      |    +497 204 |              176 |       -497 192 |     +497 048 |

[Gemma3-1B-IT]: https://huggingface.co/litert-community/Gemma3-1B-IT/blob/main/gemma3-1b-it-int4.litertlm

PiperOrigin-RevId: 821807004
2025-10-20 14:36:54 -07:00
Benjamin Chetioui
88e4cd4d01 [XLA] Use TiledHloSchedule in SymbolicTileAnalysis.
We can now produce arbitrary iteration patterns for output tiles, simply by
parametrizing calls to `ComputeTiledHloInstructions` with different
`TiledHloSchedule`s.

PiperOrigin-RevId: 821796530
2025-10-20 14:12:48 -07:00
Sohaib Iftikhar
cbeeef926f [XLA:GPU]: Remove unused method from all reduce test
PiperOrigin-RevId: 821742010
2025-10-20 12:28:17 -07:00
Hyeontaek Lim
cc9fd2b254 [IFRT Proxy] Array::pjrt_layout() uses nullptr to indicate a default layout
IFRT Proxy now returns a `nullptr` if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before.

Caveat: IFRT Proxy client infers the layout of the output arrays from `LoadedExecutable::GetOutputLayouts()`, which always concrete layouts today. Thus, these output arrays would use concrete layouts for default layouts, even if the arrays on the server side use `nullptr` for default layouts. This behavior is currently acceptable where all users convert the layout into a concrete one before using it, while this behavior will eventually change so that IFRT Proxy client reflects the array layouts on the server side more accurately.
PiperOrigin-RevId: 821741105
2025-10-20 12:19:53 -07:00
Eugene Zhulenev
0e09f486e7 [xla:pjrt:ffi] Remove deprecated TypeID registration function
PiperOrigin-RevId: 821740142
2025-10-20 12:11:20 -07:00
Michael Kuperstein
b824d4e187 [XLA] Remove verify_unique_channel_ids verifier option.
The functionality has been removed previously, but the option was never cleaned up. This does not remove the xla_ignore_channel_id debug option because it also has a non-verifier use.

PiperOrigin-RevId: 821737613
2025-10-20 11:59:45 -07:00
Kanish Anand
16e1567819 Clarify field name
PiperOrigin-RevId: 821732575
2025-10-20 11:50:54 -07:00
Zac Mustin
903ce4c6ff Use GetPjRtCpuClient directly when making a PJRT C API CPU client.
Right now, we use `GetXlaPjrtCpuClient` which in turn calls `GetPjRtCpuClient`, but we will later update `GetXlaPjrtCpuClient` to use the C sandwich, in which case we must call `GetPjRtCpuClient` here in `PJRT_Client_Create`.

This change is a no-op.

PiperOrigin-RevId: 821732030
2025-10-20 11:44:41 -07:00
A. Unique TensorFlower
458995b35d Set DNN version in DeviceDescription for autotuner cache.
The dnn_version in device_description was not set, cl/816579045 fixed it for old autotuner infra, this change ports that change to the new autotuner infra.

PiperOrigin-RevId: 821728904
2025-10-20 11:34:10 -07:00
A. Unique TensorFlower
027a15b3dc [Autotuner] Early exit if there is only one supported config.
- We encounter this case very often (for cublas autotuner), so it makes sense to optimize it.
- Running cuBLAS kernels as part of autotuning has some unintended side effect which changes the optimized HLO, this fix also mitigates the issue, while we look more into it.

PiperOrigin-RevId: 821716593
2025-10-20 11:08:05 -07:00
Emilio Cota
591ba9b4c2 Update symbols to unbreak Windows
PiperOrigin-RevId: 821712287
2025-10-20 10:56:56 -07:00
Eugene Zhulenev
fd948cba88 [xla:cpu] Migrate tf2xla to BufferAllocationInfo
Reverts f2ed04aff6

PiperOrigin-RevId: 821660240
2025-10-20 08:41:06 -07:00
Quentin Khan
3a6eef2333 Add a helper to compute *Low/High DWORD parameter couples from a 64 bit int.
Example: `MapViewOfFile` takes a couple of parameters `dwFileOffset(High|Low)`
when specifying the file offset.
PiperOrigin-RevId: 821635630
2025-10-20 07:32:42 -07:00
A. Unique TensorFlower
f32acff204 Automated Code Change
PiperOrigin-RevId: 821628227
2025-10-20 07:15:56 -07:00
A. Unique TensorFlower
789f9c8c92 Integrate LLVM at llvm/llvm-project@d5ce81dc81
Updates LLVM usage to match
[d5ce81dc8143](https://github.com/llvm/llvm-project/commit/d5ce81dc8143)

PiperOrigin-RevId: 821627987
2025-10-20 06:57:20 -07:00
Will Froom
4d5ed207cd [XLA:GPU][XTile] Handle scalar load/store in xtile lowering.
PiperOrigin-RevId: 821617202
2025-10-20 06:36:16 -07:00
Alex
8245a623aa PR #32782: [ROCm] Fix hermetic build for rocm
Imported from GitHub PR https://github.com/openxla/xla/pull/32782

📝 Summary of Changes
Fix hermetic build for rocm.

🎯 Justification
Introduce missing hipblaslt dependency.
Fix invalid libs linking and align with the data directories.

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
CI, not relevant

🧪 Unit Tests:
Not relevant

🧪 Execution Tests:
Not relevant

Copybara import of the project:

--
f5cb68b0df2265b7048d0068eedd07cccf67e228 by Alexandros Theodoridis <atheodor@amd.com>:

Add missing hermetic lib dependency

--
fe0c9a7fdd36180fea5cf63e20d864355ed98a6c by Alexandros Theodoridis <atheodor@amd.com>:

Add missing hipblaslt deps, fix the targets

--
540d79dd4287a013a3f178ef34a5b96fb8a8a92f by Alexandros Theodoridis <atheodor@amd.com>:

Make hipblaslt mandatory

--
3a6f2282669a1ece4518cc69a01ad76275b603a1 by Alexandros Theodoridis <atheodor@amd.com>:

Fix test

--
eb21b60d34978191315a0c9775d2cb53309dc72d by Alexandros Theodoridis <atheodor@amd.com>:

Ignore asnsigaltstack

--
54c8af2abd7dd682a8494caa05854d574209aa20 by Harsha Havanur Shamsundara <harsha.havanurshamsundara@amd.com>:

[ROCm] Use working sha256 for latest ROCm 7.0 docker image

--
9629a9fc9201a80dba7a0beecb8ee0797960ff6f by Harsha HS <Harsha.HavanurShamsundara@amd.com>:

[ROCm] Add ROCM_PATH repo_env to test scripts

--
1ef6772c6df6aeffcbcc2f27a0ede558fbc6270f by Alexandros Theodoridis <atheodor@amd.com>:

Fix buildifier warning

Merging this change closes #32782

PiperOrigin-RevId: 821614030
2025-10-20 06:26:00 -07:00
Kostiantyn Liepieshov
5866a4f621 fix relayout propagation for MPMD.
In the cases where the program argument with AUTO layout is used in more than one Fragment enforce the DEFAULT layout as we cannot allow different compiled layouts

PiperOrigin-RevId: 821612799
2025-10-20 06:11:10 -07:00
A. Unique TensorFlower
5707a02d98 [XLA:GPU] Enable chlo.asinh -> kAsinh HloInstruction lowering.
PiperOrigin-RevId: 821610794
2025-10-20 05:57:12 -07:00
Adrian Kuegel
7144ba7d80 Migrate ListScheduler from TuplePointsToAnalysis to HloAliasAnalysis.
Reverts 22032a9edb

PiperOrigin-RevId: 821605784
2025-10-20 05:44:20 -07:00
Will Froom
beb48d90e2 [XLA][XTile] Add xtile lowering passes for triton.
This enables migrating the triton emitter to use emit xtile entry, insert & extract in the child PR.

The main difference is the memref args in the entry function for which `MemrefToPtr` & `PtrToMemref` were introduced which closely resemble `UnrealizedConversionCastOp` with additional verification and will enable special folding of `memref::TransposeOp`.

PiperOrigin-RevId: 821593545
2025-10-20 04:57:30 -07:00
Christian Sigg
ea72bd7e48 [XLA:GPU] Allow kPad in nest_gemm_fusion.
PiperOrigin-RevId: 821581567
2025-10-20 04:16:25 -07:00
Nikita Putikhin
c842d810f4 [XLA:GPU] Initialize global_split_limits with the first config's split_k.
PiperOrigin-RevId: 821579201
2025-10-20 04:03:13 -07:00
Quentin Khan
d8d0167ebc Fix format specifiers in log.
PiperOrigin-RevId: 821550544
2025-10-20 02:48:58 -07:00
A. Unique TensorFlower
5d49a4a177 compat: Update forward compatibility horizon to 2025-10-20
PiperOrigin-RevId: 821550460
2025-10-20 02:28:51 -07:00
A. Unique TensorFlower
990709c103 Update GraphDef version to 2386.
PiperOrigin-RevId: 821550446
2025-10-20 02:14:20 -07:00
A. Unique TensorFlower
ec21abb4c3 Automated Code Change
PiperOrigin-RevId: 821547310
2025-10-20 02:00:35 -07:00
A. Unique TensorFlower
5525a3f53e Automated Code Change
PiperOrigin-RevId: 821486960
2025-10-19 22:13:03 -07:00
A. Unique TensorFlower
c37b02fbbf Automated Code Change
PiperOrigin-RevId: 821479363
2025-10-19 21:43:38 -07:00
Parker Schuh
f322c0c82d Refactor the event loop + socket integration so that it is separately testable.
This gives us the two HalfClose events + HandleEvent() and SendRawFrame() as
the API from the socket integration and subclasses can handle these
accordingly. This also moves the responsibility to destroy in the handler logic
with the contract that the event is removed from the loop on the second HalfClose event.

PiperOrigin-RevId: 821445213
2025-10-19 19:02:26 -07:00
A. Unique TensorFlower
bce0886484 Automated Code Change
PiperOrigin-RevId: 821380400
2025-10-19 13:08:38 -07:00
A. Unique TensorFlower
50808df60c Add random perturbations to the xla_tpu_msa_sort_order_overrides flag
Given a user seed, will update the MSA sort order priority of a (small?) number of randomly selected instructions during compilation.

This causes small perturbations on the compiler's prefetching decisions, which allows for 2 main features:

1. finding out if there is a single instruction which was given a "wrong" priority by the compiler so it can be fixed
- to do this, we run some benchmark many times with different seeds until we find a seed that drastically reduces the compiled code's runtime
- once we found that seed, we can use binary search to decrease the "selection range" and zero-in on the one specific offending instruction

2. finding a lot of small changes that together reduce the runtime
- we can do this using a "hill-climbing" method
- try many perturbations until you find one slightly better than the baseline.
- try many followup perturbations (perturbing the best perturbation from the previous stage) until you find one slightly better again
- repeat until no more improvements are found

NOTE: Right now there's not "good way" of finding which instructions had their priority adjusted (especially important in (1) to find the one offending instruction). The only way to do so is to increase the log-level of the compilation debug print and then look at the logs.
PiperOrigin-RevId: 821309046
2025-10-19 06:33:49 -07:00
A. Unique TensorFlower
08cc6ff10c Update GraphDef version to 2385.
PiperOrigin-RevId: 821260487
2025-10-19 02:23:25 -07:00
A. Unique TensorFlower
3b2a32ab5f compat: Update forward compatibility horizon to 2025-10-19
PiperOrigin-RevId: 821260464
2025-10-19 02:10:26 -07:00
David Majnemer
8cf42017ec [TSL] Consolidate NUMA code across different platforms.
No functional change is intended.

PiperOrigin-RevId: 821216963
2025-10-18 22:14:29 -07:00
Subhankar Shah
a139a50e56 [XLA:MSA] Allow allocation requests with a continuous default memory requirement to fall through without resulting in a failure requiring un-commit.
PiperOrigin-RevId: 821165250
2025-10-18 18:00:21 -07:00
A. Unique TensorFlower
ac5fb8fb7f Introduce and connect an XLA:TPU shardy option to fully deduplicate functions in Shardy.
This change is a no-op since both newly introduced XLA:TPU option and the corresponding option on ExportNamedComputation pass is false by default.

PiperOrigin-RevId: 821039969
2025-10-18 05:48:48 -07:00
Aliia Khasanova
4985a1c2f3 Add proto [de]serialization for HostExecuteDoneThunk
PiperOrigin-RevId: 821029998
2025-10-18 04:55:53 -07:00
A. Unique TensorFlower
083e682264 compat: Update forward compatibility horizon to 2025-10-18
PiperOrigin-RevId: 820999344
2025-10-18 02:21:41 -07:00
A. Unique TensorFlower
66f2c7c62d Update GraphDef version to 2384.
PiperOrigin-RevId: 820999336
2025-10-18 02:11:39 -07:00
A. Unique TensorFlower
17117898f9 Automated Code Change
PiperOrigin-RevId: 820964120
2025-10-17 23:15:56 -07:00
A. Unique TensorFlower
34d6417ddf Automated Code Change
PiperOrigin-RevId: 820944060
2025-10-17 21:36:01 -07:00
Maxim Ermilov
4a42fca868 First step to introduce GpuComputeCapability custom class instead of std::variant
PiperOrigin-RevId: 820940828
2025-10-17 21:24:11 -07:00
David Majnemer
4d358b2bac [TSL] Remove unused integral_types.h from TSL platform defaults.
This header provided typedefs for integral types within the `tsl` namespace, but these are no longer necessary.

PiperOrigin-RevId: 820935573
2025-10-17 21:01:34 -07:00
A. Unique TensorFlower
4beacf5a04 Make the chain specifically target the reshape-transpose chain where
the transposes are not identity permutations. Identity transposes
should be eliminated separately in HandleTranspose already.

PiperOrigin-RevId: 820903953
2025-10-17 18:56:47 -07:00
Alexander Shaposhnikov
ce65a0ad5c [XLA:CPU] Add initial bits for YNNPACK support.
+ Do not build XLA with YNNPACK on Windows.

Co-authored-by: Penporn Koanantakool <penporn@google.com>
PiperOrigin-RevId: 820896434
2025-10-17 18:36:53 -07:00
A. Unique TensorFlower
f0057ee4b7 [XLA] Delete stale comment on ShapeUtil::PermuteDimensions usage.
PiperOrigin-RevId: 820874660
2025-10-17 16:58:56 -07:00