Commit Graph

186058 Commits

Author SHA1 Message Date
A. Unique TensorFlower
2edf3555cf Sort op's first operand is now generated without duplicates if the
sort is stable.

PiperOrigin-RevId: 820067242
2025-10-15 22:46:43 -07:00
A. Unique TensorFlower
7b8e21a710 Automated Code Change
PiperOrigin-RevId: 820064592
2025-10-15 22:32:39 -07:00
A. Unique TensorFlower
12afb3d25b Automated Code Change
PiperOrigin-RevId: 820058148
2025-10-15 22:22:48 -07:00
A. Unique TensorFlower
6872f47391 Automated Code Change
PiperOrigin-RevId: 820053940
2025-10-15 22:12:26 -07:00
A. Unique TensorFlower
88d1adfc68 Automated Code Change
PiperOrigin-RevId: 820049303
2025-10-15 21:55:41 -07:00
A. Unique TensorFlower
490206b00c Automated Code Change
PiperOrigin-RevId: 820049118
2025-10-15 21:45:54 -07:00
A. Unique TensorFlower
d33383d214 Introduce tsl::WithCurrentContext for capturing the current context.
PiperOrigin-RevId: 820042807
2025-10-15 21:19:52 -07:00
Subhankar Shah
4df1a3c67f [XLA:MSA] When block prefetching, finalize the original value if a sliced value is prefetched successfully and the original value is not.
We already have a pinned allocation for the original value, it should be finalized to avoid re-allocation causing multiple pinned allocations for the same buffer.

PiperOrigin-RevId: 820015337
2025-10-15 19:56:19 -07:00
Hyeontaek Lim
55371dfcb4 [PjRt-IFRT] ifrt::PjRtArray::pjrt_layout() uses nullptr to indicate a default layout
PjRt-IFRT now returns a `nullptr` if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before.

`ifrt::PjRtArray` creation now request extra information on whether the underlying `PjRtBuffer` is using a custom layout as IFRT tracks the defaultness of array layouts. This information cannot be inferred correctly from `PjRtBuffer` alone because `PjRtBuffer::layout()` only returns a concrete layout. PjRt would mostly work fine today if a default layout is said to be a custom layout, but some strict layout equality check can fail and require more precise information to be supplied.

A few test cases in IFRT ArrayImplTest against PjRt CPU and GPU clients
have been disabled because the output array does not track the
non-default-ness of the layout correctly when
`MakeArraysFromHostBufferShards()` is implemented using
`ClientMakeArraysFromHostBufferShards()`.

PiperOrigin-RevId: 819995407
2025-10-15 18:47:15 -07:00
Parker Schuh
0c8f3eab9a Change EnterHostCallback() and
LeaveHostCallback() to use a c++ raii object to ensure
that Enter and Leave are always matched.

PiperOrigin-RevId: 819993376
2025-10-15 18:35:59 -07:00
Eugene Zhulenev
61785a4328 [xla:ffi] Add a test for automatic FFI handler signature inference from C++ function
PiperOrigin-RevId: 819988900
2025-10-15 18:14:47 -07:00
A. Unique TensorFlower
8f60f24792 Go: Update generated wrapper functions for TensorFlow ops.
PiperOrigin-RevId: 819976000
2025-10-15 17:34:48 -07:00
Eugene Zhulenev
57a6012c2e [tf2xla] Move allocator testing to allocator_test.cc
PiperOrigin-RevId: 819967180
2025-10-15 17:05:23 -07:00
Yun Peng
0ce64afaf3 Introduce HERMETIC_PYTHON_VERSION_KIND for the Bzlmod build.
Add a placeholder for `HERMETIC_PYTHON_VERSION_KIND` in the generated `py_version.bzl` file. This new variable is currently set to an empty string until we figure out how to deal with it.

PiperOrigin-RevId: 819956767
2025-10-15 16:33:03 -07:00
A. Unique TensorFlower
16f70385e8 Support the Shardy dialect in ConvertSerializedStableHloModuleToBfloat16.
PiperOrigin-RevId: 819953031
2025-10-15 16:23:57 -07:00
Parker Schuh
a6a11a6036 Implement StreamExecutorGpuClient::ScheduleRemoteSend. This allows migrating
CopyToRemoteDevice to CommonPjRtBuffer APIs.

PiperOrigin-RevId: 819949965
2025-10-15 16:12:06 -07:00
Karlo Basioli
dd90f5fa76 [XLA:GPU][codegen] Emit stablehlo for iota and implement lowering of stablehlo.iota to tt.make_range
PiperOrigin-RevId: 819934458
2025-10-15 15:31:07 -07:00
A. Unique TensorFlower
644b4a83b5 Replace stream->BlockHostUntilDone() with BlockHostUntilDoneWithHostCallback().
BlockHostUntilDone calls `cuStreamSynchronize`, which has some performance issues.

PiperOrigin-RevId: 819924678
2025-10-15 15:03:44 -07:00
Majid Dadashi
e2bc3e5167 Update tfl.transpose version inconsistency in register_ref.cc
register.cc already declares support for versions 1-7 of transpose but this seems like it was previously missed for register_ref.

PiperOrigin-RevId: 819915644
2025-10-15 14:47:18 -07:00
Frederik Gossen
2582934b0b [XLA:GPU] Add verbose tracing for BlockHostUntilDone and stream synchronization
PiperOrigin-RevId: 819914599
2025-10-15 14:35:36 -07:00
Mohammed Anany
f147bddb8d Extract launch information from the Triton compilation pipeline and use it instead of XLA's calculation. This is necessary in cases where the pipeline overrides the expected launch configuration.
This was observed when auto warp specialization was enabled. Triton requires more threads per block than expected, and this information is available in the module attributes.

PiperOrigin-RevId: 819893926
2025-10-15 13:57:39 -07:00
A. Unique TensorFlower
c265f586c6 Integrate LLVM at llvm/llvm-project@267fa8dd1e
Updates LLVM usage to match
[267fa8dd1efc](https://github.com/llvm/llvm-project/commit/267fa8dd1efc)

PiperOrigin-RevId: 819892951
2025-10-15 13:51:53 -07:00
Sean Talts
ccd875910a [XLA:CPU] Use asm to set name of intrinsic generated IR functions.
PiperOrigin-RevId: 819885948
2025-10-15 13:42:06 -07:00
Yulia Baturina
2fe50c5020 Fix MacOS nightly wheel builds by adding h5py version limit.
The new h5py releases support only MacOS 14, 15 (TF needs MacOS 10, 11).

PiperOrigin-RevId: 819884128
2025-10-15 13:30:40 -07:00
Sean Talts
3c7395e014 [XLA:CPU] Fix intrinsic library failing when passed an already vectorized call. From Will Froom.
PiperOrigin-RevId: 819867554
2025-10-15 13:18:23 -07:00
Aliia Khasanova
1108cc983b Add proto [de]serialization for CholeskyThunk.
The only non-obvious part of the thunk is `solver_context_creator`, but we can retrieve it during the deserialization from `stream_executor::Platform`, which is available during runtime.

PiperOrigin-RevId: 819863398
2025-10-15 13:00:47 -07:00
Quentin Khan
9e80aec7d8 Make file handling utilities compatible with files larger than 4GiB on 32 bit Windows.
This also changes from using `_MSC_VER` to `_WIN32` to detect compilation on windows.

PiperOrigin-RevId: 819853587
2025-10-15 12:49:39 -07:00
Sean Talts
23736ecfc6 [XLA:CPU] Add test showing exp intrinsic vectorizations.
This test will serve to illustrate an upcoming change in intrinsic_lib's vectorization logic.

PiperOrigin-RevId: 819851790
2025-10-15 12:11:45 -07:00
Karlo Basioli
948d0df409 [XLA:GPU][codegen] Emit tensor dialect for bitcast and implement lowering of bitcast from tensor dialect to triton.
PiperOrigin-RevId: 819833904
2025-10-15 11:57:19 -07:00
Eugene Zhulenev
503198fb6b [xla:cpu] Construct BufferAllocationInfo from BufferAssignment
This is no-op change, preparing for migration from cpu_function_runtime::BufferInfo to new BufferAllocationInfo type.

PiperOrigin-RevId: 819827983
2025-10-15 11:36:46 -07:00
Majid Dadashi
7e4627b63f Add support for kTfLiteInt2 type export/import.
This change introduces a new `kTfLiteInt2` type to the TFLite schema and MLIR converter. It includes:
-   Adding `INT2` to the flatbuffer schema.
-   Mapping `TensorType_INT2` to `kTfLiteInt2` in flatbuffer conversions.
-   Updating `tflite_types.h` to include `kTfLiteInt2`.
-   Modifying `flatbuffer_export.cc` to handle 2-bit integer types from MLIR and pack them densely.
-   Generalizing low-bit utility functions (`PackLowBitValuesDensely`, `UnpackDenseLowBitIntoInt8`) to support both 2-bit and 4-bit values.
-   Updating type conversion utilities to recognize and handle `kTfLiteInt2`.
-   Adjusting `util.cc` to correctly report the size and byte requirements for `kTfLiteInt2` tensors, considering their dense packing.

PiperOrigin-RevId: 819821231
2025-10-15 11:22:15 -07:00
A. Unique TensorFlower
6c32106238 Integrate Triton up to [de2ba394](de2ba3946b)
https://github.com/openxla/triton/tree/triton_integrate_branch-

PiperOrigin-RevId: 819807700
2025-10-15 11:01:37 -07:00
A. Unique TensorFlower
b545b61c0d [XLA:GPU] Provide functions to setup multicast from a single process.
PiperOrigin-RevId: 819790003
2025-10-15 10:48:13 -07:00
Aliia Khasanova
0ab9f48846 Refactor SelectKThunk to accept ThunkInfo instead of HloInstruction pointer.
PiperOrigin-RevId: 819786719
2025-10-15 10:37:40 -07:00
TensorFlower Gardener
3f09c30ad3 Merge pull request #101579 from ILCSFNO:patch-5
PiperOrigin-RevId: 819784988
2025-10-15 10:18:47 -07:00
Marcin Radomski
f0ea4b75e3 [XLA:GPU] ThunkPassPipeline: pass HloModule* to Run()
This allows SDC log dumper to derive unique path for each module execution.

PiperOrigin-RevId: 819781581
2025-10-15 10:08:55 -07:00
Peter Hawkins
009d8fdbf4 Reverts 7dbc996979
PiperOrigin-RevId: 819777372
2025-10-15 09:48:29 -07:00
Alexander Shaposhnikov
4626ec956f Bump XNNPACK version for open source builds.
PiperOrigin-RevId: 819774605
2025-10-15 09:33:48 -07:00
A. Unique TensorFlower
0290b24ad8 Internal visibility change.
PiperOrigin-RevId: 819771473
2025-10-15 09:26:34 -07:00
Mohammed Anany
6969cce01e [XLA:GPU/WS] Adding xla_gpu_experimental_enable_triton_warp_specialization flag. This is currently only used to decorate the contracting dimension loop for dot fusions going through Triton with tt.warp_specialize, enabling the feature in Triton.
PiperOrigin-RevId: 819765526
2025-10-15 09:18:52 -07:00
Joshua Lang
1b2ecc8924 Disable broken se_gpu_pjrt_client_test_2gpu_b200 test
PiperOrigin-RevId: 819764723
2025-10-15 09:06:41 -07:00
Marcin Radomski
1aa192d839 [XLA:GPU] Avoid use-after-free in StreamExecutorGpuClientTest::CopyRawToHostOutOfRange
PiperOrigin-RevId: 819763300
2025-10-15 08:44:41 -07:00
Peter Hawkins
baf408c724 Reverts 5a3a4bcd44
PiperOrigin-RevId: 819762394
2025-10-15 08:23:20 -07:00
Kostiantyn Liepieshov
2b17e0e0c0 Support SparseActivationsUnstack and SparseActivationsUnstackInterleaved custom call always return tuple result
PiperOrigin-RevId: 819743515
2025-10-15 07:30:48 -07:00
A. Unique TensorFlower
9567225474 [XLA:GPU] Enable chlo.asin -> kAsin HloInstruction lowering.
PiperOrigin-RevId: 819720031
2025-10-15 06:49:12 -07:00
Mohammed Anany
aa3cb5c5d8 [NFC] Moving extraction utility out of fusion_emitter to emitter_helpers. Also added a test for coverage as I realize this function wasn't tested.
More utilities will follow as part of an upcoming change, so this refactor makes sense to land first.

PiperOrigin-RevId: 819716328
2025-10-15 06:35:52 -07:00
Eugene Zhulenev
339325c6d7 [xla:ffi] Add XLA_FFI_TypeInfo in preparation for adding it to TypeRegistry
PiperOrigin-RevId: 819715434
2025-10-15 06:22:37 -07:00
Ilia Sergachev
2408b9968e PR #32003: [GPU][NFC] Merge methods querying fusion kind.
Imported from GitHub PR https://github.com/openxla/xla/pull/32003

Copybara import of the project:

--
2a3ad034522e871edc9c7f580e86fc3980025542 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU][NFC] Merge methods querying fusion kind.

--
ebeb25599d6017d34ea92ece415a255d109af049 by Ilia Sergachev <isergachev@nvidia.com>:

Address review requests.

Merging this change closes #32003

PiperOrigin-RevId: 819692807
2025-10-15 04:57:40 -07:00
Aleksa Arsic
9a25b01c7e PR #32283: [ROCm] Change misleading method name RocmComputeCapability::has_amd_matrix_core()
Imported from GitHub PR https://github.com/openxla/xla/pull/32283

📝 Summary of Changes
Change misleading method name RocmComputeCapability::has_amd_matrix_core() to more suitable name has_amd_mat_acc_instructions() as gfx11xx do not have matrix cores, but support matrix acceleration instruction set known as WMMA.

🎯 Justification
RocmComputeCapability::has_amd_matrix_core() is misleading as gfx11xx do not have matrix cores but still support matrix acceleration instruction set - WMMA.

🚀 Kind of Contribution
♻️ Cleanup

@xla-rotation please review my changes.

Copybara import of the project:

--
23cf1ab79fdcc4ee2ee4996973dee2c103d2762a by Aleksa Arsic <aleksa.arsic@amd.com>:

Change misleading method name RocmComputeCapability::has_amd_matrix_core() to more suitable name has_amd_mat_acc_instructions() as gfx11xx do not have matrix cores, but support matrix acceleration instruction set known as WMMA.

Merging this change closes #32283

PiperOrigin-RevId: 819652238
2025-10-15 02:53:07 -07:00
Thomas Joerg
28c0be7a10 [XLA:GPU] Run GpuKernelTilingTests on default GPU platforms. So far, this test is limited to Pascal.
PiperOrigin-RevId: 819650786
2025-10-15 02:36:10 -07:00