Commit Graph

182462 Commits

Author SHA1 Message Date
TensorFlower Gardener
3e169ee50e Merge pull request #96902 from kr1shnasomani:patch-2
PiperOrigin-RevId: 784569798
2025-07-18 07:09:03 -07:00
Misha Gutman
f6a5e17ebe Update dependencies to XNNPACK.
PiperOrigin-RevId: 784561458
2025-07-18 06:30:19 -07:00
Alexander Lyashuk
849435a30d [XLA:GPU] Move Dot strength reduction out of algebraic simplifier
and run it only once.

The plan for the follow up changes is to remove vec×matrix reduction (currently regresses some models for unrelated reasons), and only keep vec×vec.

PiperOrigin-RevId: 784555311
2025-07-18 06:18:22 -07:00
Alexander Belyaev
bbd241ea32 [XLA:GPU] Remove CHECK-CSE since it is not used.
PiperOrigin-RevId: 784555038
2025-07-18 06:08:37 -07:00
Bart Chrzaszcz
0b9012dc05 #sdy improve the error messaging when importing and exporting sharding custom calls.
PiperOrigin-RevId: 784549550
2025-07-18 05:41:10 -07:00
Zixuan Jiang
835af341cd Introduce new helper function that produces device lists for iota tile assignment. Apply it in xla_sharding_util.cc.
PiperOrigin-RevId: 784536911
2025-07-18 04:46:40 -07:00
A. Unique TensorFlower
64185a403e Introduce stable flags and associated deprecation policy for XLA debug options.
- The policy is not enforced anywhere but changes not following the deprecation policy for flags marked [Stable] can be rolled back.

PiperOrigin-RevId: 784532479
2025-07-18 04:35:09 -07:00
Adrian Kuegel
32f6d746c5 Use GetInPlaceInputOutputPairs from AliasInfo instead of HloDataflowAnalysis.
PiperOrigin-RevId: 784532374
2025-07-18 04:23:24 -07:00
Henning Becker
3ee1c8d01a Remove ifdef from ir_emitter_unnested and fix various clang-tidy warnings
Just some cleanups which I want to put out before touching the code.

PiperOrigin-RevId: 784518313
2025-07-18 03:32:14 -07:00
Henning Becker
87b9a1e10a Add TmaMetadata serialization support
This is adding the missing serialization support for the `TmaMetadata` field in the `KernelThunk`. With this change we can serialize GPU programs that use NVIDIA's TMA for memory access.

PiperOrigin-RevId: 784514465
2025-07-18 03:19:01 -07:00
A. Unique TensorFlower
fc5c897690 compat: Update forward compatibility horizon to 2025-07-18
PiperOrigin-RevId: 784496756
2025-07-18 02:20:13 -07:00
A. Unique TensorFlower
da8018a052 Update GraphDef version to 2292.
PiperOrigin-RevId: 784496623
2025-07-18 02:10:05 -07:00
A. Unique TensorFlower
f7a61c082f Automated Code Change
PiperOrigin-RevId: 784489087
2025-07-18 01:42:06 -07:00
Adrian Kuegel
4a3f1c9c0b Move GetInPlaceInputOutputPairs and related code to AliasInfo class (NFC).
FollowTupleIndirection() is also used by InstructionFusion, so we could decide
to move it to a separate header. However InstructionFusion also calls
GetInPlaceInputOutputPairs, so will have to use AliasInfo in the future,
anyway.

PiperOrigin-RevId: 784486223
2025-07-18 01:30:10 -07:00
A. Unique TensorFlower
07c10a4c2a Automated Code Change
PiperOrigin-RevId: 784462084
2025-07-17 23:53:04 -07:00
Ezekiel Calubaquib
d801474a83 Fix tests paths and visibility issue for tflite/converter
PiperOrigin-RevId: 784434492
2025-07-17 21:48:01 -07:00
Junwhan Ahn
78b2a2a14c Remove leftover logging
PiperOrigin-RevId: 784428497
2025-07-17 21:25:24 -07:00
A. Unique TensorFlower
1296be9c22 Automated Code Change
PiperOrigin-RevId: 784423468
2025-07-17 21:06:47 -07:00
A. Unique TensorFlower
db8b0b6cb7 Propagate context to the waiter destruction sequence, so that all contained operations execute with the correct context.
PiperOrigin-RevId: 784417306
2025-07-17 20:43:49 -07:00
Parker Schuh
dec6cae62d Update PjRtCpuExecutable to not rely on any internals of PjRtCpuBuffer.
Donation and usage holds can be constructed directly via CommonPjRtBuffer apis.

PiperOrigin-RevId: 784406864
2025-07-17 19:59:55 -07:00
Zixuan Jiang
a32930a77e Handle V2 xla::OpSharding in ExtractInputsForLogicalDevices and ParseAndValidateOutputSharding.
PiperOrigin-RevId: 784380192
2025-07-17 18:08:43 -07:00
Ezekiel Calubaquib
45d46cd4b1 Exclude tensorflow/lite/mlir/lite protos definitions when compiling under LiteRT repo and enable LiteRT disbale_tf_lite_py by default
PiperOrigin-RevId: 784374376
2025-07-17 17:42:57 -07:00
Raviteja Gorijala
19f9f4b798 Update version to 2.21.0
PiperOrigin-RevId: 784366491
2025-07-17 17:10:27 -07:00
Ryan M. Lefever
48ab1c76d4 [XLA:TPU] In MSA, when removing instructions, we need to remove their scoped allocations from PresetAssignments.
PiperOrigin-RevId: 784359439
2025-07-17 16:45:35 -07:00
A. Unique TensorFlower
16a4d966ff Modified python bindings to enable passing a probe_instrumentation_dir to support interpreter ops in eval_module. Consistent with StableHLO interpreter usage from command line
PiperOrigin-RevId: 784347033
2025-07-17 16:20:28 -07:00
Karlo Basioli
9276cd8ccc [XLA][host offloading] Return AsyncValue from HostOffloadingExecutable.
PiperOrigin-RevId: 784341194
2025-07-17 16:05:27 -07:00
Tom Natan
4e841e6a90 #sdy update dump names and add index as prefix so they would be clearer for users
Example of dumps without explicit collectives:
```
00.input_module.mlir
01.before_propagation.mlir
02.after_propagation.mlir
03.after_post_propagation_optimizations.mlir
04.output_module.mlir
```

PiperOrigin-RevId: 784338779
2025-07-17 15:49:24 -07:00
Alex Pivovarov
0f16ef827d [Autotuner] Add block level emitter backend for Triton fusion (3).
This update continues the development of the Triton block-level fusion emitter backend, which enables autotuning of tile configurations for custom Triton fusions in XLA.

This backend implements the following core interfaces:

GetSupportedConfigs: Enumerates all supported combinations of tile sizes for the output tensors. The generated configs can be used during autotuning to explore different performance candidates. (Implemented in a previous PR-28808)

GetDefaultConfig: Provides a default tile configuration for a given Triton fusion, used as a fallback when no tuning data is available. (Implemented in a previous PR-28515)

ApplyConfig: Applies a selected block-level fusion configuration to a Triton fusion instruction by updating its GpuBackendConfig.
PiperOrigin-RevId: 784338001
2025-07-17 15:41:47 -07:00
Hyeontaek Lim
f92085acf6 [IFRT] Add UserContextScope
`xla::ifrt::UserContextScope` provides tracking of the currently active
`xla::ifrt::UserContext` on the current thread. It gives a mechanism for IFRT
APIs to take a user-provided context and associate it with IFRT runtime objects
(`Array`, `LoadedExecutable`, etc.).

We begin the changes by first using this thread-local scoping mechanism, as
this allows more incremental steps. A helper function
`xla::ifrt::GetUserContext()` provides a way to get the current
`xla::ifrt::UserContext` in a uniform way across IFRT implementations.

The long-term plan remains to be making the propagation of context objects
explicit by making IFRT APIs to take a `user_context` argument. This
thread-local scoping mechanism can be transferred from the IFRT API to IFRT
users or runtimes in case they still prefer using this mechanism.

PiperOrigin-RevId: 784333019
2025-07-17 15:30:39 -07:00
Parker Schuh
8055d1e7ab Add ReleaseDeviceMemoryOwnership implementation based on
TrackedCpuDeviceBuffer::BlockForOperationsToComplete().

PiperOrigin-RevId: 784330854
2025-07-17 15:20:53 -07:00
David Dunleavy
b60c10bd96 Migrate uses of XLA_TEST_BACKEND macros to use utilities in xla_test_backend_predicates.h
PiperOrigin-RevId: 784327416
2025-07-17 15:06:20 -07:00
A. Unique TensorFlower
3f8adf372c Correctly identify async start and done ops in latency hiding scheduler.
PiperOrigin-RevId: 784300662
2025-07-17 13:59:12 -07:00
Kanish Anand
ed20d76183 Close output shardings to respect allow_spmd_sharding_propagation_to_output flag set to default {false} value. Added multiple test variants to test shardy, use_compile_options_from_model.
PiperOrigin-RevId: 784280731
2025-07-17 13:09:16 -07:00
A. Unique TensorFlower
026ccaa614 [NCCL] Upgrade TF NCCL version to 2.26.5
PiperOrigin-RevId: 784275018
2025-07-17 13:00:41 -07:00
Penporn Koanantakool
72eb221690 [xla:cpu] Make DotLibraryRewriter support greedy fusion mode.
Greedy mode = Put all supported ops in fusion nodes.

PiperOrigin-RevId: 784274467
2025-07-17 12:49:17 -07:00
A. Unique TensorFlower
716c28ce99 Internal change only
PiperOrigin-RevId: 784271831
2025-07-17 12:42:19 -07:00
Shahriar Rouf
cc1e56de89 Optimize BM_GlobalDecreasingSizeBestFitHeap benchmark by up to 3%.
- `reserve` vector in `SlicedAllocationFinder::Find`.
- `std::move` vector in `ObservedPermutationManager::Insert`.

PiperOrigin-RevId: 784271536
2025-07-17 12:31:49 -07:00
Majid Dadashi
6f051796d3 Relax the folding size threshold to 200 MiB.
PiperOrigin-RevId: 784254600
2025-07-17 11:42:40 -07:00
Parker Schuh
6be89e0c0a Update CommonPjRtBufferImpl to have specialized versions for both cpu->device
and device->cpu which reduce the number of copies.

PiperOrigin-RevId: 784251783
2025-07-17 11:32:19 -07:00
Bart Chrzaszcz
0373d252ed #sdy define the utils that JAX jaxlib will use to allow for falling back to GSPMD when loading an old checkpoint.
PiperOrigin-RevId: 784238988
2025-07-17 10:56:46 -07:00
Alex Pivovarov
f926778e7f [Autotuner] Add block level emitter backend for Triton fusion (2).
This change continues the work on the Triton block-level fusion emitter backend, which enables autotuning of tile configurations for custom Triton fusions in XLA.

This backend implements the following core interfaces:

- GetSupportedConfigs: Enumerates all supported combinations of tile sizes for the output tensors. The generated configs can be used during autotuning to explore different performance candidates.

- GetDefaultConfig: Provides a default tile configuration for a given Triton fusion, used as a fallback when no tuning data is available. (Implemented in a previous PR-28515)

- ApplyConfig: Applies a selected block-level fusion configuration to a Triton fusion instruction by updating its GpuBackendConfig. (will be added in the next PR)

PiperOrigin-RevId: 784233964
2025-07-17 10:43:46 -07:00
Alex Pivovarov
e9b7af391b Use ASSERT_THAT(..., IsOkAndHolds(true)) for consistency and correctness
This PR updates test assertions in two XLA C++ test files by replacing `EXPECT_THAT(..., IsOkAndHolds(true)) with ASSERT_THAT(...)`.

Rationale:
- Consistency: Aligns with other XLA tests, which use ASSERT for pass.Run() calls when subsequent checks depend on successful execution.
- Correctness: Ensures test failures are caught immediately, as ASSERT_THAT is fatal and prevents further checks from running on invalid state.
PiperOrigin-RevId: 784228038
2025-07-17 10:28:15 -07:00
TensorFlower Gardener
4a9106584c Merge pull request #96866 from zqw86713:pr96430_ut2
PiperOrigin-RevId: 784211258
2025-07-17 09:44:05 -07:00
A. Unique TensorFlower
ba097d1ad6 Reverts 812bb86d50
PiperOrigin-RevId: 784205852
2025-07-17 09:27:05 -07:00
Zixuan Jiang
87a1e57c98 Simplify ShouldSkipForSideEffect function in zero_sized_hlo_elimination.
PiperOrigin-RevId: 784186726
2025-07-17 08:24:49 -07:00
Benjamin Chetioui
0e8a601cd6 [XLA:GPU] Remove unused DotSparsityRewriter.
PiperOrigin-RevId: 784172883
2025-07-17 07:36:38 -07:00
A. Unique TensorFlower
e58a7a1ae4 Automated Code Change
PiperOrigin-RevId: 784141700
2025-07-17 05:44:37 -07:00
Mikhail Goncharov
8dfa95f8b5 [XLA:GPU] additional logging in triton fusion numeric verifier
PiperOrigin-RevId: 784141127
2025-07-17 05:33:26 -07:00
Christian Sigg
9a236b8dcb [xla:gpu][triton] triton-xla-squeeze-dims pass improvements.
- **Reshape Operations:** The pass now handles `tt.reshape` operations that add unit dimensions by converting them into `tt.expand_dims` operations.
- **Pointer Calculation:** A bug in pointer offset calculation within `SqueezeMakeTensorPtr` is fixed, ensuring correct behavior with non-zero offsets.
- **Load/Store Operations:**
    - The pass now correctly disables rewriting `tt.load` and `tt.store` operations that have masks.
    - A new safety check prevents folding a `tt.load` if one of the dimensions being squeezed is also subject to a boundary check.
- **Store Operation:** The `squeeze-store` pattern is now enabled by default, and its controlling option has been removed.

These changes are accompanied by updated and new tests to validate the improved functionality and bug fixes.

With these changes, the pass is a win in all benchmarks I've looked at.
I'm planning to collect more extensive data and enable the pass in a separate change.

PiperOrigin-RevId: 784092242
2025-07-17 02:33:08 -07:00
A. Unique TensorFlower
ee6df269f1 compat: Update forward compatibility horizon to 2025-07-17
PiperOrigin-RevId: 784088781
2025-07-17 02:26:25 -07:00