tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-07 12:20:24 +01:00

Author	SHA1	Message	Date
A. Unique TensorFlower	2edf3555cf	Sort op's first operand is now generated without duplicates if the sort is stable. PiperOrigin-RevId: 820067242	2025-10-15 22:46:43 -07:00
A. Unique TensorFlower	7b8e21a710	Automated Code Change PiperOrigin-RevId: 820064592	2025-10-15 22:32:39 -07:00
A. Unique TensorFlower	12afb3d25b	Automated Code Change PiperOrigin-RevId: 820058148	2025-10-15 22:22:48 -07:00
A. Unique TensorFlower	6872f47391	Automated Code Change PiperOrigin-RevId: 820053940	2025-10-15 22:12:26 -07:00
A. Unique TensorFlower	88d1adfc68	Automated Code Change PiperOrigin-RevId: 820049303	2025-10-15 21:55:41 -07:00
A. Unique TensorFlower	490206b00c	Automated Code Change PiperOrigin-RevId: 820049118	2025-10-15 21:45:54 -07:00
A. Unique TensorFlower	d33383d214	Introduce `tsl::WithCurrentContext` for capturing the current context. PiperOrigin-RevId: 820042807	2025-10-15 21:19:52 -07:00
Subhankar Shah	4df1a3c67f	[XLA:MSA] When block prefetching, finalize the original value if a sliced value is prefetched successfully and the original value is not. We already have a pinned allocation for the original value, it should be finalized to avoid re-allocation causing multiple pinned allocations for the same buffer. PiperOrigin-RevId: 820015337	2025-10-15 19:56:19 -07:00
Hyeontaek Lim	55371dfcb4	[PjRt-IFRT] `ifrt::PjRtArray::pjrt_layout()` uses `nullptr` to indicate a default layout PjRt-IFRT now returns a `nullptr` if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before. `ifrt::PjRtArray` creation now request extra information on whether the underlying `PjRtBuffer` is using a custom layout as IFRT tracks the defaultness of array layouts. This information cannot be inferred correctly from `PjRtBuffer` alone because `PjRtBuffer::layout()` only returns a concrete layout. PjRt would mostly work fine today if a default layout is said to be a custom layout, but some strict layout equality check can fail and require more precise information to be supplied. A few test cases in IFRT ArrayImplTest against PjRt CPU and GPU clients have been disabled because the output array does not track the non-default-ness of the layout correctly when `MakeArraysFromHostBufferShards()` is implemented using `ClientMakeArraysFromHostBufferShards()`. PiperOrigin-RevId: 819995407	2025-10-15 18:47:15 -07:00
Parker Schuh	0c8f3eab9a	Change EnterHostCallback() and LeaveHostCallback() to use a c++ raii object to ensure that Enter and Leave are always matched. PiperOrigin-RevId: 819993376	2025-10-15 18:35:59 -07:00
Eugene Zhulenev	61785a4328	[xla:ffi] Add a test for automatic FFI handler signature inference from C++ function PiperOrigin-RevId: 819988900	2025-10-15 18:14:47 -07:00
A. Unique TensorFlower	8f60f24792	Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 819976000	2025-10-15 17:34:48 -07:00
Eugene Zhulenev	57a6012c2e	[tf2xla] Move allocator testing to allocator_test.cc PiperOrigin-RevId: 819967180	2025-10-15 17:05:23 -07:00
Yun Peng	0ce64afaf3	Introduce `HERMETIC_PYTHON_VERSION_KIND` for the Bzlmod build. Add a placeholder for `HERMETIC_PYTHON_VERSION_KIND` in the generated `py_version.bzl` file. This new variable is currently set to an empty string until we figure out how to deal with it. PiperOrigin-RevId: 819956767	2025-10-15 16:33:03 -07:00
A. Unique TensorFlower	16f70385e8	Support the Shardy dialect in ConvertSerializedStableHloModuleToBfloat16. PiperOrigin-RevId: 819953031	2025-10-15 16:23:57 -07:00
Parker Schuh	a6a11a6036	Implement StreamExecutorGpuClient::ScheduleRemoteSend. This allows migrating CopyToRemoteDevice to CommonPjRtBuffer APIs. PiperOrigin-RevId: 819949965	2025-10-15 16:12:06 -07:00
Karlo Basioli	dd90f5fa76	[XLA:GPU][codegen] Emit stablehlo for iota and implement lowering of stablehlo.iota to tt.make_range PiperOrigin-RevId: 819934458	2025-10-15 15:31:07 -07:00
A. Unique TensorFlower	644b4a83b5	Replace `stream->BlockHostUntilDone()` with `BlockHostUntilDoneWithHostCallback()`. BlockHostUntilDone calls `cuStreamSynchronize`, which has some performance issues. PiperOrigin-RevId: 819924678	2025-10-15 15:03:44 -07:00
Majid Dadashi	e2bc3e5167	Update tfl.transpose version inconsistency in register_ref.cc register.cc already declares support for versions 1-7 of transpose but this seems like it was previously missed for register_ref. PiperOrigin-RevId: 819915644	2025-10-15 14:47:18 -07:00
Frederik Gossen	2582934b0b	[XLA:GPU] Add verbose tracing for BlockHostUntilDone and stream synchronization PiperOrigin-RevId: 819914599	2025-10-15 14:35:36 -07:00
Mohammed Anany	f147bddb8d	Extract launch information from the Triton compilation pipeline and use it instead of XLA's calculation. This is necessary in cases where the pipeline overrides the expected launch configuration. This was observed when auto warp specialization was enabled. Triton requires more threads per block than expected, and this information is available in the module attributes. PiperOrigin-RevId: 819893926	2025-10-15 13:57:39 -07:00
A. Unique TensorFlower	c265f586c6	Integrate LLVM at llvm/llvm-project@267fa8dd1e Updates LLVM usage to match [267fa8dd1efc](https://github.com/llvm/llvm-project/commit/267fa8dd1efc) PiperOrigin-RevId: 819892951	2025-10-15 13:51:53 -07:00
Sean Talts	ccd875910a	[XLA:CPU] Use asm to set name of intrinsic generated IR functions. PiperOrigin-RevId: 819885948	2025-10-15 13:42:06 -07:00
Yulia Baturina	2fe50c5020	Fix MacOS nightly wheel builds by adding h5py version limit. The new h5py releases support only MacOS 14, 15 (TF needs MacOS 10, 11). PiperOrigin-RevId: 819884128	2025-10-15 13:30:40 -07:00
Sean Talts	3c7395e014	[XLA:CPU] Fix intrinsic library failing when passed an already vectorized call. From Will Froom. PiperOrigin-RevId: 819867554	2025-10-15 13:18:23 -07:00
Aliia Khasanova	1108cc983b	Add proto [de]serialization for CholeskyThunk. The only non-obvious part of the thunk is `solver_context_creator`, but we can retrieve it during the deserialization from `stream_executor::Platform`, which is available during runtime. PiperOrigin-RevId: 819863398	2025-10-15 13:00:47 -07:00
Quentin Khan	9e80aec7d8	Make file handling utilities compatible with files larger than 4GiB on 32 bit Windows. This also changes from using `_MSC_VER` to `_WIN32` to detect compilation on windows. PiperOrigin-RevId: 819853587	2025-10-15 12:49:39 -07:00
Sean Talts	23736ecfc6	[XLA:CPU] Add test showing exp intrinsic vectorizations. This test will serve to illustrate an upcoming change in intrinsic_lib's vectorization logic. PiperOrigin-RevId: 819851790	2025-10-15 12:11:45 -07:00
Karlo Basioli	948d0df409	[XLA:GPU][codegen] Emit tensor dialect for bitcast and implement lowering of bitcast from tensor dialect to triton. PiperOrigin-RevId: 819833904	2025-10-15 11:57:19 -07:00
Eugene Zhulenev	503198fb6b	[xla:cpu] Construct BufferAllocationInfo from BufferAssignment This is no-op change, preparing for migration from cpu_function_runtime::BufferInfo to new BufferAllocationInfo type. PiperOrigin-RevId: 819827983	2025-10-15 11:36:46 -07:00
Majid Dadashi	7e4627b63f	Add support for kTfLiteInt2 type export/import. This change introduces a new `kTfLiteInt2` type to the TFLite schema and MLIR converter. It includes: - Adding `INT2` to the flatbuffer schema. - Mapping `TensorType_INT2` to `kTfLiteInt2` in flatbuffer conversions. - Updating `tflite_types.h` to include `kTfLiteInt2`. - Modifying `flatbuffer_export.cc` to handle 2-bit integer types from MLIR and pack them densely. - Generalizing low-bit utility functions (`PackLowBitValuesDensely`, `UnpackDenseLowBitIntoInt8`) to support both 2-bit and 4-bit values. - Updating type conversion utilities to recognize and handle `kTfLiteInt2`. - Adjusting `util.cc` to correctly report the size and byte requirements for `kTfLiteInt2` tensors, considering their dense packing. PiperOrigin-RevId: 819821231	2025-10-15 11:22:15 -07:00
A. Unique TensorFlower	6c32106238	Integrate Triton up to [de2ba394](`de2ba3946b`) https://github.com/openxla/triton/tree/triton_integrate_branch- PiperOrigin-RevId: 819807700	2025-10-15 11:01:37 -07:00
A. Unique TensorFlower	b545b61c0d	[XLA:GPU] Provide functions to setup multicast from a single process. PiperOrigin-RevId: 819790003	2025-10-15 10:48:13 -07:00
Aliia Khasanova	0ab9f48846	Refactor `SelectKThunk` to accept `ThunkInfo` instead of `HloInstruction` pointer. PiperOrigin-RevId: 819786719	2025-10-15 10:37:40 -07:00
TensorFlower Gardener	3f09c30ad3	Merge pull request #101579 from ILCSFNO:patch-5 PiperOrigin-RevId: 819784988	2025-10-15 10:18:47 -07:00
Marcin Radomski	f0ea4b75e3	[XLA:GPU] ThunkPassPipeline: pass HloModule* to Run() This allows SDC log dumper to derive unique path for each module execution. PiperOrigin-RevId: 819781581	2025-10-15 10:08:55 -07:00
Peter Hawkins	009d8fdbf4	Reverts `7dbc996979` PiperOrigin-RevId: 819777372	2025-10-15 09:48:29 -07:00
Alexander Shaposhnikov	4626ec956f	Bump XNNPACK version for open source builds. PiperOrigin-RevId: 819774605	2025-10-15 09:33:48 -07:00
A. Unique TensorFlower	0290b24ad8	Internal visibility change. PiperOrigin-RevId: 819771473	2025-10-15 09:26:34 -07:00
Mohammed Anany	6969cce01e	[XLA:GPU/WS] Adding `xla_gpu_experimental_enable_triton_warp_specialization` flag. This is currently only used to decorate the contracting dimension loop for dot fusions going through Triton with `tt.warp_specialize`, enabling the feature in Triton. PiperOrigin-RevId: 819765526	2025-10-15 09:18:52 -07:00
Joshua Lang	1b2ecc8924	Disable broken se_gpu_pjrt_client_test_2gpu_b200 test PiperOrigin-RevId: 819764723	2025-10-15 09:06:41 -07:00
Marcin Radomski	1aa192d839	[XLA:GPU] Avoid use-after-free in StreamExecutorGpuClientTest::CopyRawToHostOutOfRange PiperOrigin-RevId: 819763300	2025-10-15 08:44:41 -07:00
Peter Hawkins	baf408c724	Reverts `5a3a4bcd44` PiperOrigin-RevId: 819762394	2025-10-15 08:23:20 -07:00
Kostiantyn Liepieshov	2b17e0e0c0	Support SparseActivationsUnstack and SparseActivationsUnstackInterleaved custom call always return tuple result PiperOrigin-RevId: 819743515	2025-10-15 07:30:48 -07:00
A. Unique TensorFlower	9567225474	[XLA:GPU] Enable chlo.asin -> kAsin `HloInstruction` lowering. PiperOrigin-RevId: 819720031	2025-10-15 06:49:12 -07:00
Mohammed Anany	aa3cb5c5d8	[NFC] Moving extraction utility out of fusion_emitter to emitter_helpers. Also added a test for coverage as I realize this function wasn't tested. More utilities will follow as part of an upcoming change, so this refactor makes sense to land first. PiperOrigin-RevId: 819716328	2025-10-15 06:35:52 -07:00
Eugene Zhulenev	339325c6d7	[xla:ffi] Add XLA_FFI_TypeInfo in preparation for adding it to TypeRegistry PiperOrigin-RevId: 819715434	2025-10-15 06:22:37 -07:00
Ilia Sergachev	2408b9968e	PR #32003 : [GPU][NFC] Merge methods querying fusion kind. Imported from GitHub PR https://github.com/openxla/xla/pull/32003 Copybara import of the project: -- 2a3ad034522e871edc9c7f580e86fc3980025542 by Ilia Sergachev <isergachev@nvidia.com>: [GPU][NFC] Merge methods querying fusion kind. -- ebeb25599d6017d34ea92ece415a255d109af049 by Ilia Sergachev <isergachev@nvidia.com>: Address review requests. Merging this change closes #32003 PiperOrigin-RevId: 819692807	2025-10-15 04:57:40 -07:00
Aleksa Arsic	9a25b01c7e	PR #32283 : [ROCm] Change misleading method name RocmComputeCapability::has_amd_matrix_core() Imported from GitHub PR https://github.com/openxla/xla/pull/32283 📝 Summary of Changes Change misleading method name RocmComputeCapability::has_amd_matrix_core() to more suitable name has_amd_mat_acc_instructions() as gfx11xx do not have matrix cores, but support matrix acceleration instruction set known as WMMA. 🎯 Justification RocmComputeCapability::has_amd_matrix_core() is misleading as gfx11xx do not have matrix cores but still support matrix acceleration instruction set - WMMA. 🚀 Kind of Contribution ♻️ Cleanup @xla-rotation please review my changes. Copybara import of the project: -- 23cf1ab79fdcc4ee2ee4996973dee2c103d2762a by Aleksa Arsic <aleksa.arsic@amd.com>: Change misleading method name RocmComputeCapability::has_amd_matrix_core() to more suitable name has_amd_mat_acc_instructions() as gfx11xx do not have matrix cores, but support matrix acceleration instruction set known as WMMA. Merging this change closes #32283 PiperOrigin-RevId: 819652238	2025-10-15 02:53:07 -07:00
Thomas Joerg	28c0be7a10	[XLA:GPU] Run `GpuKernelTilingTest`s on default GPU platforms. So far, this test is limited to Pascal. PiperOrigin-RevId: 819650786	2025-10-15 02:36:10 -07:00

... 2 3 4 5 6 ...

186058 Commits