Commit Graph

186478 Commits

Author SHA1 Message Date
A. Unique TensorFlower
fe344908fa Automated Code Change
PiperOrigin-RevId: 826451035
2025-10-31 05:48:15 -07:00
Kanish Anand
e7dcad735e Add equality operator for NamedSharding
PiperOrigin-RevId: 826442714
2025-10-31 05:16:43 -07:00
Aliia Khasanova
add489fd8d Use std::vector<BufferAllocation> instead of std::vector<std::unique_ptr<BufferAllocation>> in DynamicSliceThunk.
`BufferAllocation::Slice` stores a raw pointer to the corresponding `BufferAllocation`. Now we keep the embedded thunk allocations alive by stroing unique_ptrs in the wrapping DynamicSliceThunk. The current design makes it hard to reuse the existing infrastructure, specifically to serialize `DynamicSliceThunk`. To address this, I'm changing fake_allocations to be  `std::vector<BufferAllocation>`.

The move constructor `std::vector::vector(std::vector&&)` is guaranteed to have constant time complexity and therefore it steals the internal data buffer from the source vector. This infers that the pointers to allocations are kept stable as long as:
* we preallocate the vector size
* we never copy the vector, but move

To make it safer for later usage, we can explicitely prohibid BufferAllocation to be  copyable/moveable. I'm going to do this in the following cl.

PiperOrigin-RevId: 826440060
2025-10-31 05:05:43 -07:00
A. Unique TensorFlower
3326b0221f Automated Code Change
PiperOrigin-RevId: 826433106
2025-10-31 04:44:23 -07:00
A. Unique TensorFlower
e32304ddc5 [Autotuner]Add support for sharded autotuning in the pass.
PiperOrigin-RevId: 826417614
2025-10-31 03:50:55 -07:00
Eusebio Durán Montaña
e32f20dd91 Use factory function to create CubSortThunk
The `CubSortThunk` constructor was calling a function that returns a `absl::StatusOr`, and ignoring non-ok statuses and just accessing the value.

Presumably in prod the status is always ok, but making this failure case explicit.

PiperOrigin-RevId: 826410861
2025-10-31 03:37:19 -07:00
Kanish Anand
adfd891fde Refactor Mesh ctor's
PiperOrigin-RevId: 826410314
2025-10-31 03:26:09 -07:00
A. Unique TensorFlower
8734ec41d5 Disable capturing of dot RHS operands
This is proving to be unreliable.

PiperOrigin-RevId: 826395008
2025-10-31 02:45:34 -07:00
A. Unique TensorFlower
d6d4e02248 [XLA:GPU] Add multimem setup.
PiperOrigin-RevId: 826391581
2025-10-31 02:35:40 -07:00
A. Unique TensorFlower
a6e96588e3 compat: Update forward compatibility horizon to 2025-10-31
PiperOrigin-RevId: 826388828
2025-10-31 02:29:46 -07:00
A. Unique TensorFlower
f572aeee90 Update GraphDef version to 2397.
PiperOrigin-RevId: 826388642
2025-10-31 02:16:45 -07:00
A. Unique TensorFlower
993369077a Reverts bf23bf1b32
PiperOrigin-RevId: 826380939
2025-10-31 01:51:12 -07:00
A. Unique TensorFlower
d25ccb438d Reverts cef240807a
PiperOrigin-RevId: 826374657
2025-10-31 01:32:52 -07:00
A. Unique TensorFlower
4cfaa7e25c Automated Code Change
PiperOrigin-RevId: 826372270
2025-10-31 01:19:21 -07:00
A. Unique TensorFlower
5133f83425 Automated Code Change
PiperOrigin-RevId: 826363842
2025-10-31 00:50:37 -07:00
A. Unique TensorFlower
ebacf2a211 Automated Code Change
PiperOrigin-RevId: 826342599
2025-10-30 23:59:59 -07:00
Bill Varcho
cef240807a [ReplicaGroupV3][MeshAxesReplicaGroupList][1/2] Add initial class definition for V3 replica group.
PiperOrigin-RevId: 826334561
2025-10-30 23:18:40 -07:00
Felix Wang
d9c76aafeb Adjust the collective-permute cross host type to MULTI_HOST_NON_WORLD_LEVEL only.
PiperOrigin-RevId: 826327580
2025-10-30 22:54:49 -07:00
Eugene Zhulenev
d90723f48e [xla:pjrt:cpu] Add e2e test for YnnFusion + PJRT client
PiperOrigin-RevId: 826323865
2025-10-30 22:41:49 -07:00
Eugene Zhulenev
7ad55e8818 [xla:cpu] Add an end-to-end test for ynn fusions
PiperOrigin-RevId: 826318525
2025-10-30 22:20:44 -07:00
Eugene Zhulenev
bf23bf1b32 [xla:cpu] Pass HloModule pointer to Thunk SerDes
PiperOrigin-RevId: 826312546
2025-10-30 22:11:41 -07:00
Eugene Zhulenev
56d3b19280 [xla:cpu] NFC: Rename protos for Xnn/Ynn fusion options
PiperOrigin-RevId: 826304955
2025-10-30 22:01:47 -07:00
A. Unique TensorFlower
a95c558dc4 Save compile options with the compiled IFRT IR program to be used later for serialization
PiperOrigin-RevId: 826301016
2025-10-30 21:54:24 -07:00
A. Unique TensorFlower
e61bac51b1 Automated Code Change
PiperOrigin-RevId: 826298597
2025-10-30 21:47:00 -07:00
A. Unique TensorFlower
b2334ac330 Integrate LLVM at llvm/llvm-project@22079e3f36
Updates LLVM usage to match
[22079e3f3698](https://github.com/llvm/llvm-project/commit/22079e3f3698)

PiperOrigin-RevId: 826294004
2025-10-30 20:44:41 -07:00
A. Unique TensorFlower
6d86cff5f3 Automated Code Change
PiperOrigin-RevId: 826286610
2025-10-30 20:12:04 -07:00
Eugene Zhulenev
db273660ba [xla:pjrt] Remove PjRtFuture type alias
Cleaning up BUILD files and includes will be done separately.

PiperOrigin-RevId: 826280389
2025-10-30 19:44:40 -07:00
Eugene Zhulenev
429a0cf1c7 [xla:cpu] Add target machine features to the error message
PiperOrigin-RevId: 826253599
2025-10-30 17:49:12 -07:00
Eugene Zhulenev
d9024af6d4 [xla:cpu] Do not register legacy runtime symbols with XLA:CPU custom calls
PiperOrigin-RevId: 826208548
2025-10-30 16:25:55 -07:00
Niklas Vangerow
31bb7c01ff Migrate multioutput_fusion_test to use PjRt.
PiperOrigin-RevId: 826203532
2025-10-30 15:18:22 -07:00
Parker Schuh
c3d0bf7023 Add additional way to poision a connection (to allow testing different
poisoning strategies).

PiperOrigin-RevId: 826193232
2025-10-30 14:52:58 -07:00
A. Unique TensorFlower
c40bb10b96 Add the option to dump before/after autotuned instructions in AutotunerConfig.
- This change is required to still support the functionality of xla_gpu_dump_autotuned_gemm_fusions in the new infra.

PiperOrigin-RevId: 826161466
2025-10-30 14:39:24 -07:00
A. Unique TensorFlower
8f60516a86 Refactor: Move common SymbolicMapTest setup to the fixture.
This change moves the initialization of commonly used `SymbolicExpr` and a sample `SymbolicMap` into the `SymbolicMapTest` fixture to reduce code duplication across tests.

PiperOrigin-RevId: 826161168
2025-10-30 14:19:16 -07:00
A. Unique TensorFlower
7736af79a6 Only enable YNNPACK for bf16 and int8 for now.
We plan to enable this in stages, starting with int8 and bf16, where the improvement is more significant.

PiperOrigin-RevId: 826160602
2025-10-30 14:05:02 -07:00
Karlo Basioli
f4ebf9d47d [XLA][codegen] Migrate triton operations that have shared dialect lowerings are implemented for.
These were missed in previous commits.
Addresses transpose and bitcast.

PiperOrigin-RevId: 826158776
2025-10-30 13:54:31 -07:00
Niklas Vangerow
1424c4f739 Migrate slice_test to use PjRt.
PiperOrigin-RevId: 826158235
2025-10-30 13:45:44 -07:00
Karlo Basioli
5973848600 [XLA][codegen] Emit shlo reshape from the fusion emitter and lower it to triton for the triton backend.
PiperOrigin-RevId: 826147865
2025-10-30 13:32:56 -07:00
Quoc Truong
f01a7fea8c Update ML Build Docker container to use hermetic C++
PiperOrigin-RevId: 826147864
2025-10-30 13:25:44 -07:00
A. Unique TensorFlower
7e7b1a3015 Allow empty dimension list in SymbolicMap::ReplaceDimsAndSymbols
I originally assumed the caller was always providing a full list of replacements but IndexingMap have some uses where the dim_replacement list is empty, resulting in a CHECK-fail.

So, I'm allowing the user to provide either dim or symbol empty lists to ReplaceDimsAndSymbols. In that case, the dims/symbols won't be replaced.

PiperOrigin-RevId: 826138814
2025-10-30 13:18:28 -07:00
Niklas Vangerow
175774337e Migrate params_test to use PjRt.
PiperOrigin-RevId: 826137636
2025-10-30 13:07:24 -07:00
Daniel Sosa
5dcb571931 Remove unused code from convert.py
PiperOrigin-RevId: 826137491
2025-10-30 12:59:13 -07:00
Zixuan Jiang
146c4f56b7 Clear frontend attributes for get-tuple-elements of GlobalToLocal and LocalToGlobal custom-calls.
The GlobalToLocal and LocalToGlobal custom-calls are for Shardy round trip. These get-tuple-elements will be removed when we import the Shardy dialect and thus they do not need to hold frontend attributes.

This can reduce the size of the generated HLO module text.

PiperOrigin-RevId: 826134489
2025-10-30 12:43:45 -07:00
William S. Moses
a94890b1f9 Improve CUDNN error messages
PiperOrigin-RevId: 826124080
2025-10-30 12:34:16 -07:00
Oleg Shyshkov
9eeebc9be5 [XLA:GPU] Use a single intra-host ragged-all-to-all in the decomposition.
Instead of 2 ra2a + concat, we can double the output buffer and adjust output offsets. This way we can save on latency by having only one multi-GPU synchronization.

PiperOrigin-RevId: 826122665
2025-10-30 12:24:20 -07:00
Eugene Zhulenev
9b51864c7b [xla:ffi] Add example of async custom call in XLA:GPU
PiperOrigin-RevId: 826121283
2025-10-30 12:11:20 -07:00
Niklas Vangerow
061041963e Migrate map_test to use PjRt.
PiperOrigin-RevId: 826107887
2025-10-30 11:52:36 -07:00
A. Unique TensorFlower
dd3a14ace4 [Autotuner] Add sharding support using KeyValueStore Interface.
- The logic is ported from gemm_fusion_autotuner. I have changed the key of the Key Value store to be just module-fingerprint, earlier it was module-fingerprint + autotunable-fusion-set-from-the-module-fingerprint. The module fingerprint should already represent the fusion-sets contained in it.
- We can improve or just remove this functionality when we design storage for offline autotuning.

PiperOrigin-RevId: 826103885
2025-10-30 11:43:43 -07:00
Karlo Basioli
4ffcba9004 [XLA][codegen] Emit stablehlo reduce op from the fusion emitter and lower it to triton for the triton backend.
PiperOrigin-RevId: 826102479
2025-10-30 11:30:55 -07:00
Niklas Vangerow
0c87bef802 Migrate reshape_test to use PjRt.
PiperOrigin-RevId: 826087067
2025-10-30 11:21:47 -07:00
Will Froom
6dd75c4e8b [XTile] Modify Stable HLO check on iota to restrict it to the 1D case.
PiperOrigin-RevId: 826085272
2025-10-30 11:01:37 -07:00