Commit Graph

186058 Commits

Author SHA1 Message Date
Zixuan Jiang
1a142dab0a Refactor shardy_xla_pass.
Remove unused code.

PiperOrigin-RevId: 820872613
2025-10-17 16:49:40 -07:00
A. Unique TensorFlower
f2ed04aff6 Reverts 0fab8daf15
PiperOrigin-RevId: 820869543
2025-10-17 16:39:59 -07:00
A. Unique TensorFlower
206f1c1891 Update XNNPACK in XLA
PiperOrigin-RevId: 820860720
2025-10-17 16:14:15 -07:00
Haibo Huang
a619e2de08 Expose new methods to PjRtTopologyDescription.
PiperOrigin-RevId: 820837477
2025-10-17 15:04:17 -07:00
A. Unique TensorFlower
119e1f6731 https://github.com/llvm/llvm-project/pull/162120 removed some automatic namespace determinations, so we need to explicitly specify some namespaces now. This is needed
for the LLVM integrate.

PiperOrigin-RevId: 820836649
2025-10-17 14:52:43 -07:00
David Majnemer
bdb78510d0 [TSL] Clean up integral types
Let's migrate to u?int\d+_t types instead of our own bespoke stuff.

PiperOrigin-RevId: 820815523
2025-10-17 14:19:08 -07:00
Eugene Zhulenev
d531cdce30 [xla:ffi] Add TypeRegistry::TypeInfo to be able to register functions to manipulate user-defined types
PiperOrigin-RevId: 820811829
2025-10-17 13:41:40 -07:00
Kevin Gleason
46522b8a20 [StableHLO] Add transpose simplification
PiperOrigin-RevId: 820804015
2025-10-17 13:31:39 -07:00
Niklas Vangerow
13006913d2 Migrate sample_file_test to HloRunnerPjRt.
PiperOrigin-RevId: 820803579
2025-10-17 13:21:59 -07:00
Hyeontaek Lim
05101b9755 [PjRt-IFRT] Temporary workaround for output layout handling
PjRt-IFRT directly or indirectly fetched optimized HLO to get the output
layout mode and output layouts. This seems to introduce a regression in
some jobs that use PJRT C API and have a too large serialized HLO (> 2 GiB).

As a workaround, PjRt-IFRT gracefully handles output layout mode and
layout discovery errors, and falls back to concrete layouts that are
directly obtained from output `PjRtBuffer`s, should give the same
behavior before/after the default layout handling change.

Further changes will follow to discover default layout modes and layouts
without going through `PjRtLoadedExecutable::GetHloModules()`.

PiperOrigin-RevId: 820785277
2025-10-17 12:41:35 -07:00
Parker Schuh
b07145966f Add StatusOr to transfer server BulkTransportInterface on the bond id to
forward errors from bond connection failures to the control plane connection.

PiperOrigin-RevId: 820783819
2025-10-17 12:28:16 -07:00
Eugene Zhulenev
0fab8daf15 [xla:cpu] Migrate tf2xla to BufferAllocationInfo
Reverts 94fbd7554e

PiperOrigin-RevId: 820770766
2025-10-17 11:54:08 -07:00
Benjamin Chetioui
81798b5240 [XLA] Throw away TilingSpecification in the TransposedDotTiledHloSchedule.
After relaxing the constraints related to the iteration space in a recent
change, this is no longer necessary.

PiperOrigin-RevId: 820766539
2025-10-17 11:33:01 -07:00
Yulia Baturina
5f41aebf5f Increase the Linux arm64 wheel size limit to 270 MB to unblock nightly builds.
PiperOrigin-RevId: 820758886
2025-10-17 11:18:57 -07:00
A. Unique TensorFlower
94fbd7554e Reverts fb52ce8275
PiperOrigin-RevId: 820748684
2025-10-17 10:58:15 -07:00
Penporn Koanantakool
8614a97d98 [xla:cpu:ynn] Add build macros for YNNPACK integration.
We won't build XLA with YNNPACK on Windows yet.

PiperOrigin-RevId: 820744698
2025-10-17 10:40:45 -07:00
TensorFlower Gardener
812e8c2fd8 Merge pull request #102126 from nishair:security/fix-command-injection-grpc-tpu-worker
PiperOrigin-RevId: 820713421
2025-10-17 09:24:12 -07:00
Kostiantyn Liepieshov
f910c98db0 Use R"hlo(...)hlo" for HLO text in sample_text_test.cc.
This improves readability and allows for better syntax highlighting of the embedded HLO strings.

PiperOrigin-RevId: 820710394
2025-10-17 09:12:53 -07:00
Eugene Zhulenev
fb52ce8275 [xla:cpu] Migrate tf2xla to BufferAllocationInfo
PiperOrigin-RevId: 820707093
2025-10-17 08:59:31 -07:00
Eugene Zhulenev
4752801386 [xla:ffi] Make TypeInfo mandatory in XLA_FFI_REGISTER_TYPE
Add placeholders for future Type serialization/deserialization. It's not an ABI breaking change as it's unused today, and it allows to avoid ABI breaking change in the future when FFI will add proper ser/des support for user defined types.

PiperOrigin-RevId: 820676169
2025-10-17 07:20:25 -07:00
Aliia Khasanova
30d25d6d18 Add proto [de]serialization for HostExecuteStartThunk
PiperOrigin-RevId: 820645056
2025-10-17 05:32:26 -07:00
Karlo Basioli
0bb1532ddf [XLA] Enable multihost runner to load unoptimized hlo snapshots dumped without custom serialization.
PiperOrigin-RevId: 820643951
2025-10-17 05:26:10 -07:00
A. Unique TensorFlower
51fc1ac0d5 Improve logging and error messages from autotuner.
- The VLOG messages are updated to more accurately describe whether the autotuner is finding a config in cache, using a default, or actively tuning for the best config.
- The error contains the HLO instruction.

PiperOrigin-RevId: 820640768
2025-10-17 05:16:19 -07:00
Eugene Zhulenev
52749919c9 [xla:cpu] Add buffer_allocation_info to xla_cpu_runtime_hdrs
PiperOrigin-RevId: 820639686
2025-10-17 05:03:10 -07:00
Mohammed Anany
097f587e4e [XLA:GPU/WS] Adding test coverage for auto warp specialization via Triton.
PiperOrigin-RevId: 820637611
2025-10-17 04:49:39 -07:00
Nikita Putikhin
cc58fb18fd [XLA:GPU] Enable dots with block_n=8 in triton and autotuner
This change utilizes recently added Triton support for smaller block sizes.

Skipping occupancy optimization for some configs is essentially a workaround for incompatible split_k values. The impact of these configs is limited however because they are only present in non-exhaustive mode, so they mostly get filtered out anyway.

PiperOrigin-RevId: 820617352
2025-10-17 03:32:51 -07:00
Will Froom
abc19d2d20 [XLA:CPU] Combine optimization & lowering pass managers by using callback pass.
PiperOrigin-RevId: 820610316
2025-10-17 03:07:44 -07:00
Karlo Basioli
5da47fcdd8 [XLA:GPU][codegen] Emit shlo for broadcast_in_dim and lower to equivalent triton op.
PiperOrigin-RevId: 820598440
2025-10-17 02:33:27 -07:00
A. Unique TensorFlower
b5129aa315 Update GraphDef version to 2383.
PiperOrigin-RevId: 820596359
2025-10-17 02:25:54 -07:00
A. Unique TensorFlower
837017cee8 compat: Update forward compatibility horizon to 2025-10-17
PiperOrigin-RevId: 820596293
2025-10-17 02:15:14 -07:00
Zixuan Jiang
0ab4818f74 Use all-gather in the spmd_partitioner_test.
Before this change, we disallowed all-gather such that the partitioner generates `all-reduce(dynamic-update-slice())` pattern. With this change, we allow all-gather for two reasons.
1. In most cases, all-gather is allowed and preferred.
2. It is easier to read and match the partitioner result.

PiperOrigin-RevId: 820593767
2025-10-17 02:02:58 -07:00
Ilia Sergachev
4cd7465b84 PR #32388: [GPU] Sub-byte collective normalization: support collectives with non-minor-most last dimension.
Imported from GitHub PR https://github.com/openxla/xla/pull/32388

📝 Summary of Changes
Support collectives with non-minor-most last dimension in the sub-byte collective normalization pass.

🎯 Justification
Makes more collectives efficient, not require type conversion.

🚀 Kind of Contribution
Performance Improvement.

📊 Benchmark (for Performance Improvements)
```
Before:

## Execution time, file=u4_all_gather_1x8.hlo repeat=1 duration=68384ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=2 duration=67744ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=3 duration=66976ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=4 duration=67040ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=5 duration=66816ns

After:

## Execution time, file=u4_all_gather_1x8.hlo repeat=1 duration=41216ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=2 duration=40960ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=3 duration=40960ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=4 duration=41056ns
## Execution time, file=u4_all_gather_1x8.hlo repeat=5 duration=40960ns
```
Measured on 8xH100 DGX.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
a3777523ffffbcc59da285544e3fb5575d098b9c by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Sub-byte collective normalization: support collectives with non-minor-most last dimension.

Merging this change closes #32388

PiperOrigin-RevId: 820585923
2025-10-17 01:38:24 -07:00
Harsha H S
086937e138 PR #32678: [ROCm] Use working sha256 for latest ROCm 7.0 docker image and fix test scripts
Imported from GitHub PR https://github.com/openxla/xla/pull/32678

📝 Summary of Changes
- Fix sha256 of docker image to ensure CI is not broken due to malformed image
- Fix test scripts by passing ROCM_PATH to bazel sandbox via repo_env

🎯 Justification
Continued CI runs

🚀 Kind of Contribution
 🧪 Tests

Copybara import of the project:

--
3ca8114613d8e002c137f28bb6608639d08a724a by Harsha Havanur Shamsundara <harsha.havanurshamsundara@amd.com>:

[ROCm] Use working sha256 for latest ROCm 7.0 docker image

--
09ddfbdf205a6406cdd67e20671f41455fffe0f9 by Harsha HS <Harsha.HavanurShamsundara@amd.com>:

[ROCm] Add ROCM_PATH repo_env to test scripts

Merging this change closes #32678

PiperOrigin-RevId: 820582560
2025-10-17 01:25:06 -07:00
Shanbin Ke
f573329cc6 PR #32718: [XLA:GPU] add conv fusion support in cudnn fusion compiler
Imported from GitHub PR https://github.com/openxla/xla/pull/32718

📝 Summary of Changes
This PR adds conv fusion support in cudnn fusion compiler.

* add conv type in `CuDnnFusionConfig` to represent different types of conv. We are getting rid of the conv custom call target so this info has be preserved in fusion config.
* add `ConvDimensionAdapter` to generate NCHW **logical layout** for cudnn frontend while physical layout could be NHWC (most preferable layout) or NCHW (for int conv). Only NHWC layout is used in the unit tests because layout assignment currently doesn't handle conv fusion to transform other layouts to NHWC, this needs to be addressed in separate PR.
* add conv translation rule from XLA conv to cudnn frontend graph API.
* Other parts of the lowering is taken care automatically by current cudnn fusion compiler: workspace allocation/graph validation/graph  compilation/graph serialization.

🎯 Justification
This is the first step to unify the conv as cudnn fusion in XLA. Conv custom call will be replaced with conv fusions in the future.

🚀 Kind of Contribution
 New Feature

📊 Benchmark (for Performance Improvements)
No Performance changes are expected.

🧪 Unit Tests:
Added 3 hand written NHWC conv unit tests for conv_fprop/conv_dgrad/conv_wgrad.

🧪 Execution Tests:
Added 3 hand written NHWC conv unit tests for conv_fprop/conv_dgrad/conv_wgrad.
Copybara import of the project:

--
57555cd0e3759aacb7a98135c3261f4cc3f642c2 by Cjkkkk <ske@nvidia.com>:

init

--
d6edecfa42a6371a0908e22daeb8deaf32998ece by Cjkkkk <ske@nvidia.com>:

address comments

--
17df6f8451274f070d7d332a126cfefa1ef7df83 by Cjkkkk <ske@nvidia.com>:

removed one comment

--
1b7c63b1ade7751cf8f68c7fb11cd68491440081 by Cjkkkk <ske@nvidia.com>:

add const

Merging this change closes #32718

PiperOrigin-RevId: 820574737
2025-10-17 00:58:07 -07:00
A. Unique TensorFlower
89045aa0a3 Go: Update generated wrapper functions for TensorFlow ops.
PiperOrigin-RevId: 820558317
2025-10-17 00:12:04 -07:00
TensorFlower Gardener
a3397645a1 Merge pull request #101517 from ILCSFNO:patch-1
PiperOrigin-RevId: 820555347
2025-10-16 23:59:21 -07:00
TensorFlower Gardener
cbdc771ee5 Merge pull request #101522 from ILCSFNO:patch-4
PiperOrigin-RevId: 820553874
2025-10-16 23:51:49 -07:00
A. Unique TensorFlower
f27d9d63d7 Automated Code Change
PiperOrigin-RevId: 820551714
2025-10-16 23:40:53 -07:00
Jacques Pienaar
2096501975 Remove register everything.
This should just be IR one.

PiperOrigin-RevId: 820548236
2025-10-16 23:22:26 -07:00
A. Unique TensorFlower
1ddcd859d3 Move absl_thread_pool to XLA as YnnThreadpool
PiperOrigin-RevId: 820544939
2025-10-16 23:13:24 -07:00
Christian Sigg
c9d8d37611 [xla:gpu] Relax nested gemm fusion constraints.
This change removes dimension ordering constraints in `AcceptDotOperand`.

PiperOrigin-RevId: 820542964
2025-10-16 23:02:42 -07:00
A. Unique TensorFlower
d46c1b99a9 Automated Code Change
PiperOrigin-RevId: 820542824
2025-10-16 22:51:48 -07:00
Majid Dadashi
46f983d3ff Enable lowering from FQ Composite for 2-bit
This also adds an additional test for this lowering.

PiperOrigin-RevId: 820534395
2025-10-16 22:16:26 -07:00
Gregory Pataky
c0d9a60f83 Internal changes to project structure
PiperOrigin-RevId: 820527062
2025-10-16 21:52:14 -07:00
Penporn Koanantakool
b2f2568bcc [xla:cpu:xnn] Temporarily disable XNNPACK by default.
PiperOrigin-RevId: 820519075
2025-10-16 21:31:15 -07:00
A. Unique TensorFlower
a0e060ad78 Automated Code Change
PiperOrigin-RevId: 820517563
2025-10-16 21:18:59 -07:00
Majid Dadashi
f67cb87691 Add support for int2/int4 in tfl.cast
PiperOrigin-RevId: 820509011
2025-10-16 20:50:00 -07:00
A. Unique TensorFlower
5592d364ec Automated Code Change
PiperOrigin-RevId: 820505039
2025-10-16 20:36:41 -07:00
A. Unique TensorFlower
a8a747470e Update XNNPACK in XLA
PiperOrigin-RevId: 820502825
2025-10-16 20:24:07 -07:00
Eugene Zhulenev
ef3a678718 [xla:cpu] Fix BufferAllocationInfo::InOutParameter constructor
PiperOrigin-RevId: 820456592
2025-10-16 17:49:08 -07:00