tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

Author	SHA1	Message	Date
TensorFlower Gardener	3e169ee50e	Merge pull request #96902 from kr1shnasomani:patch-2 PiperOrigin-RevId: 784569798	2025-07-18 07:09:03 -07:00
Misha Gutman	f6a5e17ebe	Update dependencies to XNNPACK. PiperOrigin-RevId: 784561458	2025-07-18 06:30:19 -07:00
Alexander Lyashuk	849435a30d	[XLA:GPU] Move Dot strength reduction out of algebraic simplifier and run it only once. The plan for the follow up changes is to remove vec×matrix reduction (currently regresses some models for unrelated reasons), and only keep vec×vec. PiperOrigin-RevId: 784555311	2025-07-18 06:18:22 -07:00
Alexander Belyaev	bbd241ea32	[XLA:GPU] Remove CHECK-CSE since it is not used. PiperOrigin-RevId: 784555038	2025-07-18 06:08:37 -07:00
Bart Chrzaszcz	0b9012dc05	#sdy improve the error messaging when importing and exporting sharding custom calls. PiperOrigin-RevId: 784549550	2025-07-18 05:41:10 -07:00
Zixuan Jiang	835af341cd	Introduce new helper function that produces device lists for iota tile assignment. Apply it in xla_sharding_util.cc. PiperOrigin-RevId: 784536911	2025-07-18 04:46:40 -07:00
A. Unique TensorFlower	64185a403e	Introduce stable flags and associated deprecation policy for XLA debug options. - The policy is not enforced anywhere but changes not following the deprecation policy for flags marked [Stable] can be rolled back. PiperOrigin-RevId: 784532479	2025-07-18 04:35:09 -07:00
Adrian Kuegel	32f6d746c5	Use GetInPlaceInputOutputPairs from AliasInfo instead of HloDataflowAnalysis. PiperOrigin-RevId: 784532374	2025-07-18 04:23:24 -07:00
Henning Becker	3ee1c8d01a	Remove ifdef from ir_emitter_unnested and fix various clang-tidy warnings Just some cleanups which I want to put out before touching the code. PiperOrigin-RevId: 784518313	2025-07-18 03:32:14 -07:00
Henning Becker	87b9a1e10a	Add TmaMetadata serialization support This is adding the missing serialization support for the `TmaMetadata` field in the `KernelThunk`. With this change we can serialize GPU programs that use NVIDIA's TMA for memory access. PiperOrigin-RevId: 784514465	2025-07-18 03:19:01 -07:00
A. Unique TensorFlower	fc5c897690	compat: Update forward compatibility horizon to 2025-07-18 PiperOrigin-RevId: 784496756	2025-07-18 02:20:13 -07:00
A. Unique TensorFlower	da8018a052	Update GraphDef version to 2292. PiperOrigin-RevId: 784496623	2025-07-18 02:10:05 -07:00
A. Unique TensorFlower	f7a61c082f	Automated Code Change PiperOrigin-RevId: 784489087	2025-07-18 01:42:06 -07:00
Adrian Kuegel	4a3f1c9c0b	Move GetInPlaceInputOutputPairs and related code to AliasInfo class (NFC). FollowTupleIndirection() is also used by InstructionFusion, so we could decide to move it to a separate header. However InstructionFusion also calls GetInPlaceInputOutputPairs, so will have to use AliasInfo in the future, anyway. PiperOrigin-RevId: 784486223	2025-07-18 01:30:10 -07:00
A. Unique TensorFlower	07c10a4c2a	Automated Code Change PiperOrigin-RevId: 784462084	2025-07-17 23:53:04 -07:00
Ezekiel Calubaquib	d801474a83	Fix tests paths and visibility issue for tflite/converter PiperOrigin-RevId: 784434492	2025-07-17 21:48:01 -07:00
Junwhan Ahn	78b2a2a14c	Remove leftover logging PiperOrigin-RevId: 784428497	2025-07-17 21:25:24 -07:00
A. Unique TensorFlower	1296be9c22	Automated Code Change PiperOrigin-RevId: 784423468	2025-07-17 21:06:47 -07:00
A. Unique TensorFlower	db8b0b6cb7	Propagate context to the waiter destruction sequence, so that all contained operations execute with the correct context. PiperOrigin-RevId: 784417306	2025-07-17 20:43:49 -07:00
Parker Schuh	dec6cae62d	Update PjRtCpuExecutable to not rely on any internals of PjRtCpuBuffer. Donation and usage holds can be constructed directly via CommonPjRtBuffer apis. PiperOrigin-RevId: 784406864	2025-07-17 19:59:55 -07:00
Zixuan Jiang	a32930a77e	Handle V2 `xla::OpSharding` in `ExtractInputsForLogicalDevices` and `ParseAndValidateOutputSharding`. PiperOrigin-RevId: 784380192	2025-07-17 18:08:43 -07:00
Ezekiel Calubaquib	45d46cd4b1	Exclude tensorflow/lite/mlir/lite protos definitions when compiling under LiteRT repo and enable LiteRT disbale_tf_lite_py by default PiperOrigin-RevId: 784374376	2025-07-17 17:42:57 -07:00
Raviteja Gorijala	19f9f4b798	Update version to 2.21.0 PiperOrigin-RevId: 784366491	2025-07-17 17:10:27 -07:00
Ryan M. Lefever	48ab1c76d4	[XLA:TPU] In MSA, when removing instructions, we need to remove their scoped allocations from PresetAssignments. PiperOrigin-RevId: 784359439	2025-07-17 16:45:35 -07:00
A. Unique TensorFlower	16a4d966ff	Modified python bindings to enable passing a probe_instrumentation_dir to support interpreter ops in eval_module. Consistent with StableHLO interpreter usage from command line PiperOrigin-RevId: 784347033	2025-07-17 16:20:28 -07:00
Karlo Basioli	9276cd8ccc	[XLA][host offloading] Return AsyncValue from HostOffloadingExecutable. PiperOrigin-RevId: 784341194	2025-07-17 16:05:27 -07:00
Tom Natan	4e841e6a90	#sdy update dump names and add index as prefix so they would be clearer for users Example of dumps without explicit collectives: ``` 00.input_module.mlir 01.before_propagation.mlir 02.after_propagation.mlir 03.after_post_propagation_optimizations.mlir 04.output_module.mlir ``` PiperOrigin-RevId: 784338779	2025-07-17 15:49:24 -07:00
Alex Pivovarov	0f16ef827d	[Autotuner] Add block level emitter backend for Triton fusion (3). This update continues the development of the Triton block-level fusion emitter backend, which enables autotuning of tile configurations for custom Triton fusions in XLA. This backend implements the following core interfaces: GetSupportedConfigs: Enumerates all supported combinations of tile sizes for the output tensors. The generated configs can be used during autotuning to explore different performance candidates. (Implemented in a previous PR-28808) GetDefaultConfig: Provides a default tile configuration for a given Triton fusion, used as a fallback when no tuning data is available. (Implemented in a previous PR-28515) ApplyConfig: Applies a selected block-level fusion configuration to a Triton fusion instruction by updating its GpuBackendConfig. PiperOrigin-RevId: 784338001	2025-07-17 15:41:47 -07:00
Hyeontaek Lim	f92085acf6	[IFRT] Add `UserContextScope` `xla::ifrt::UserContextScope` provides tracking of the currently active `xla::ifrt::UserContext` on the current thread. It gives a mechanism for IFRT APIs to take a user-provided context and associate it with IFRT runtime objects (`Array`, `LoadedExecutable`, etc.). We begin the changes by first using this thread-local scoping mechanism, as this allows more incremental steps. A helper function `xla::ifrt::GetUserContext()` provides a way to get the current `xla::ifrt::UserContext` in a uniform way across IFRT implementations. The long-term plan remains to be making the propagation of context objects explicit by making IFRT APIs to take a `user_context` argument. This thread-local scoping mechanism can be transferred from the IFRT API to IFRT users or runtimes in case they still prefer using this mechanism. PiperOrigin-RevId: 784333019	2025-07-17 15:30:39 -07:00
Parker Schuh	8055d1e7ab	Add ReleaseDeviceMemoryOwnership implementation based on TrackedCpuDeviceBuffer::BlockForOperationsToComplete(). PiperOrigin-RevId: 784330854	2025-07-17 15:20:53 -07:00
David Dunleavy	b60c10bd96	Migrate uses of `XLA_TEST_BACKEND` macros to use utilities in `xla_test_backend_predicates.h` PiperOrigin-RevId: 784327416	2025-07-17 15:06:20 -07:00
A. Unique TensorFlower	3f8adf372c	Correctly identify async start and done ops in latency hiding scheduler. PiperOrigin-RevId: 784300662	2025-07-17 13:59:12 -07:00
Kanish Anand	ed20d76183	Close output shardings to respect `allow_spmd_sharding_propagation_to_output` flag set to default `{false}` value. Added multiple test variants to test shardy, use_compile_options_from_model. PiperOrigin-RevId: 784280731	2025-07-17 13:09:16 -07:00
A. Unique TensorFlower	026ccaa614	[NCCL] Upgrade TF NCCL version to 2.26.5 PiperOrigin-RevId: 784275018	2025-07-17 13:00:41 -07:00
Penporn Koanantakool	72eb221690	[xla:cpu] Make DotLibraryRewriter support greedy fusion mode. Greedy mode = Put all supported ops in fusion nodes. PiperOrigin-RevId: 784274467	2025-07-17 12:49:17 -07:00
A. Unique TensorFlower	716c28ce99	Internal change only PiperOrigin-RevId: 784271831	2025-07-17 12:42:19 -07:00
Shahriar Rouf	cc1e56de89	Optimize `BM_GlobalDecreasingSizeBestFitHeap` benchmark by up to 3%. - `reserve` vector in `SlicedAllocationFinder::Find`. - `std::move` vector in `ObservedPermutationManager::Insert`. PiperOrigin-RevId: 784271536	2025-07-17 12:31:49 -07:00
Majid Dadashi	6f051796d3	Relax the folding size threshold to 200 MiB. PiperOrigin-RevId: 784254600	2025-07-17 11:42:40 -07:00
Parker Schuh	6be89e0c0a	Update CommonPjRtBufferImpl to have specialized versions for both cpu->device and device->cpu which reduce the number of copies. PiperOrigin-RevId: 784251783	2025-07-17 11:32:19 -07:00
Bart Chrzaszcz	0373d252ed	#sdy define the utils that JAX jaxlib will use to allow for falling back to GSPMD when loading an old checkpoint. PiperOrigin-RevId: 784238988	2025-07-17 10:56:46 -07:00
Alex Pivovarov	f926778e7f	[Autotuner] Add block level emitter backend for Triton fusion (2). This change continues the work on the Triton block-level fusion emitter backend, which enables autotuning of tile configurations for custom Triton fusions in XLA. This backend implements the following core interfaces: - GetSupportedConfigs: Enumerates all supported combinations of tile sizes for the output tensors. The generated configs can be used during autotuning to explore different performance candidates. - GetDefaultConfig: Provides a default tile configuration for a given Triton fusion, used as a fallback when no tuning data is available. (Implemented in a previous PR-28515) - ApplyConfig: Applies a selected block-level fusion configuration to a Triton fusion instruction by updating its GpuBackendConfig. (will be added in the next PR) PiperOrigin-RevId: 784233964	2025-07-17 10:43:46 -07:00
Alex Pivovarov	e9b7af391b	Use ASSERT_THAT(..., IsOkAndHolds(true)) for consistency and correctness This PR updates test assertions in two XLA C++ test files by replacing `EXPECT_THAT(..., IsOkAndHolds(true)) with ASSERT_THAT(...)`. Rationale: - Consistency: Aligns with other XLA tests, which use ASSERT for pass.Run() calls when subsequent checks depend on successful execution. - Correctness: Ensures test failures are caught immediately, as ASSERT_THAT is fatal and prevents further checks from running on invalid state. PiperOrigin-RevId: 784228038	2025-07-17 10:28:15 -07:00
TensorFlower Gardener	4a9106584c	Merge pull request #96866 from zqw86713:pr96430_ut2 PiperOrigin-RevId: 784211258	2025-07-17 09:44:05 -07:00
A. Unique TensorFlower	ba097d1ad6	Reverts `812bb86d50` PiperOrigin-RevId: 784205852	2025-07-17 09:27:05 -07:00
Zixuan Jiang	87a1e57c98	Simplify ShouldSkipForSideEffect function in zero_sized_hlo_elimination. PiperOrigin-RevId: 784186726	2025-07-17 08:24:49 -07:00
Benjamin Chetioui	0e8a601cd6	[XLA:GPU] Remove unused `DotSparsityRewriter`. PiperOrigin-RevId: 784172883	2025-07-17 07:36:38 -07:00
A. Unique TensorFlower	e58a7a1ae4	Automated Code Change PiperOrigin-RevId: 784141700	2025-07-17 05:44:37 -07:00
Mikhail Goncharov	8dfa95f8b5	[XLA:GPU] additional logging in triton fusion numeric verifier PiperOrigin-RevId: 784141127	2025-07-17 05:33:26 -07:00
Christian Sigg	9a236b8dcb	[xla:gpu][triton] `triton-xla-squeeze-dims` pass improvements. - Reshape Operations: The pass now handles `tt.reshape` operations that add unit dimensions by converting them into `tt.expand_dims` operations. - Pointer Calculation: A bug in pointer offset calculation within `SqueezeMakeTensorPtr` is fixed, ensuring correct behavior with non-zero offsets. - Load/Store Operations: - The pass now correctly disables rewriting `tt.load` and `tt.store` operations that have masks. - A new safety check prevents folding a `tt.load` if one of the dimensions being squeezed is also subject to a boundary check. - Store Operation: The `squeeze-store` pattern is now enabled by default, and its controlling option has been removed. These changes are accompanied by updated and new tests to validate the improved functionality and bug fixes. With these changes, the pass is a win in all benchmarks I've looked at. I'm planning to collect more extensive data and enable the pass in a separate change. PiperOrigin-RevId: 784092242	2025-07-17 02:33:08 -07:00
A. Unique TensorFlower	ee6df269f1	compat: Update forward compatibility horizon to 2025-07-17 PiperOrigin-RevId: 784088781	2025-07-17 02:26:25 -07:00

1 2 3 4 5 ...

182462 Commits