tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-07 12:20:24 +01:00

Author	SHA1	Message	Date
Zixuan Jiang	1a142dab0a	Refactor shardy_xla_pass. Remove unused code. PiperOrigin-RevId: 820872613	2025-10-17 16:49:40 -07:00
A. Unique TensorFlower	f2ed04aff6	Reverts `0fab8daf15` PiperOrigin-RevId: 820869543	2025-10-17 16:39:59 -07:00
A. Unique TensorFlower	206f1c1891	Update XNNPACK in XLA PiperOrigin-RevId: 820860720	2025-10-17 16:14:15 -07:00
Haibo Huang	a619e2de08	Expose new methods to PjRtTopologyDescription. PiperOrigin-RevId: 820837477	2025-10-17 15:04:17 -07:00
A. Unique TensorFlower	119e1f6731	https://github.com/llvm/llvm-project/pull/162120 removed some automatic namespace determinations, so we need to explicitly specify some namespaces now. This is needed for the LLVM integrate. PiperOrigin-RevId: 820836649	2025-10-17 14:52:43 -07:00
David Majnemer	bdb78510d0	[TSL] Clean up integral types Let's migrate to u?int\d+_t types instead of our own bespoke stuff. PiperOrigin-RevId: 820815523	2025-10-17 14:19:08 -07:00
Eugene Zhulenev	d531cdce30	[xla:ffi] Add TypeRegistry::TypeInfo to be able to register functions to manipulate user-defined types PiperOrigin-RevId: 820811829	2025-10-17 13:41:40 -07:00
Kevin Gleason	46522b8a20	[StableHLO] Add transpose simplification PiperOrigin-RevId: 820804015	2025-10-17 13:31:39 -07:00
Niklas Vangerow	13006913d2	Migrate sample_file_test to HloRunnerPjRt. PiperOrigin-RevId: 820803579	2025-10-17 13:21:59 -07:00
Hyeontaek Lim	05101b9755	[PjRt-IFRT] Temporary workaround for output layout handling PjRt-IFRT directly or indirectly fetched optimized HLO to get the output layout mode and output layouts. This seems to introduce a regression in some jobs that use PJRT C API and have a too large serialized HLO (> 2 GiB). As a workaround, PjRt-IFRT gracefully handles output layout mode and layout discovery errors, and falls back to concrete layouts that are directly obtained from output `PjRtBuffer`s, should give the same behavior before/after the default layout handling change. Further changes will follow to discover default layout modes and layouts without going through `PjRtLoadedExecutable::GetHloModules()`. PiperOrigin-RevId: 820785277	2025-10-17 12:41:35 -07:00
Parker Schuh	b07145966f	Add StatusOr to transfer server BulkTransportInterface on the bond id to forward errors from bond connection failures to the control plane connection. PiperOrigin-RevId: 820783819	2025-10-17 12:28:16 -07:00
Eugene Zhulenev	0fab8daf15	[xla:cpu] Migrate tf2xla to BufferAllocationInfo Reverts `94fbd7554e` PiperOrigin-RevId: 820770766	2025-10-17 11:54:08 -07:00
Benjamin Chetioui	81798b5240	[XLA] Throw away `TilingSpecification` in the `TransposedDotTiledHloSchedule`. After relaxing the constraints related to the iteration space in a recent change, this is no longer necessary. PiperOrigin-RevId: 820766539	2025-10-17 11:33:01 -07:00
Yulia Baturina	5f41aebf5f	Increase the Linux arm64 wheel size limit to 270 MB to unblock nightly builds. PiperOrigin-RevId: 820758886	2025-10-17 11:18:57 -07:00
A. Unique TensorFlower	94fbd7554e	Reverts `fb52ce8275` PiperOrigin-RevId: 820748684	2025-10-17 10:58:15 -07:00
Penporn Koanantakool	8614a97d98	[xla:cpu:ynn] Add build macros for YNNPACK integration. We won't build XLA with YNNPACK on Windows yet. PiperOrigin-RevId: 820744698	2025-10-17 10:40:45 -07:00
TensorFlower Gardener	812e8c2fd8	Merge pull request #102126 from nishair:security/fix-command-injection-grpc-tpu-worker PiperOrigin-RevId: 820713421	2025-10-17 09:24:12 -07:00
Kostiantyn Liepieshov	f910c98db0	Use `R"hlo(...)hlo"` for HLO text in `sample_text_test.cc`. This improves readability and allows for better syntax highlighting of the embedded HLO strings. PiperOrigin-RevId: 820710394	2025-10-17 09:12:53 -07:00
Eugene Zhulenev	fb52ce8275	[xla:cpu] Migrate tf2xla to BufferAllocationInfo PiperOrigin-RevId: 820707093	2025-10-17 08:59:31 -07:00
Eugene Zhulenev	4752801386	[xla:ffi] Make TypeInfo mandatory in XLA_FFI_REGISTER_TYPE Add placeholders for future Type serialization/deserialization. It's not an ABI breaking change as it's unused today, and it allows to avoid ABI breaking change in the future when FFI will add proper ser/des support for user defined types. PiperOrigin-RevId: 820676169	2025-10-17 07:20:25 -07:00
Aliia Khasanova	30d25d6d18	Add proto [de]serialization for HostExecuteStartThunk PiperOrigin-RevId: 820645056	2025-10-17 05:32:26 -07:00
Karlo Basioli	0bb1532ddf	[XLA] Enable multihost runner to load unoptimized hlo snapshots dumped without custom serialization. PiperOrigin-RevId: 820643951	2025-10-17 05:26:10 -07:00
A. Unique TensorFlower	51fc1ac0d5	Improve logging and error messages from autotuner. - The VLOG messages are updated to more accurately describe whether the autotuner is finding a config in cache, using a default, or actively tuning for the best config. - The error contains the HLO instruction. PiperOrigin-RevId: 820640768	2025-10-17 05:16:19 -07:00
Eugene Zhulenev	52749919c9	[xla:cpu] Add buffer_allocation_info to xla_cpu_runtime_hdrs PiperOrigin-RevId: 820639686	2025-10-17 05:03:10 -07:00
Mohammed Anany	097f587e4e	[XLA:GPU/WS] Adding test coverage for auto warp specialization via Triton. PiperOrigin-RevId: 820637611	2025-10-17 04:49:39 -07:00
Nikita Putikhin	cc58fb18fd	[XLA:GPU] Enable dots with block_n=8 in triton and autotuner This change utilizes recently added Triton support for smaller block sizes. Skipping occupancy optimization for some configs is essentially a workaround for incompatible split_k values. The impact of these configs is limited however because they are only present in non-exhaustive mode, so they mostly get filtered out anyway. PiperOrigin-RevId: 820617352	2025-10-17 03:32:51 -07:00
Will Froom	abc19d2d20	[XLA:CPU] Combine optimization & lowering pass managers by using callback pass. PiperOrigin-RevId: 820610316	2025-10-17 03:07:44 -07:00
Karlo Basioli	5da47fcdd8	[XLA:GPU][codegen] Emit shlo for broadcast_in_dim and lower to equivalent triton op. PiperOrigin-RevId: 820598440	2025-10-17 02:33:27 -07:00
A. Unique TensorFlower	b5129aa315	Update GraphDef version to 2383. PiperOrigin-RevId: 820596359	2025-10-17 02:25:54 -07:00
A. Unique TensorFlower	837017cee8	compat: Update forward compatibility horizon to 2025-10-17 PiperOrigin-RevId: 820596293	2025-10-17 02:15:14 -07:00
Zixuan Jiang	0ab4818f74	Use all-gather in the spmd_partitioner_test. Before this change, we disallowed all-gather such that the partitioner generates `all-reduce(dynamic-update-slice())` pattern. With this change, we allow all-gather for two reasons. 1. In most cases, all-gather is allowed and preferred. 2. It is easier to read and match the partitioner result. PiperOrigin-RevId: 820593767	2025-10-17 02:02:58 -07:00
Ilia Sergachev	4cd7465b84	PR #32388 : [GPU] Sub-byte collective normalization: support collectives with non-minor-most last dimension. Imported from GitHub PR https://github.com/openxla/xla/pull/32388 📝 Summary of Changes Support collectives with non-minor-most last dimension in the sub-byte collective normalization pass. 🎯 Justification Makes more collectives efficient, not require type conversion. 🚀 Kind of Contribution Performance Improvement. 📊 Benchmark (for Performance Improvements) ``` Before: ## Execution time, file=u4_all_gather_1x8.hlo repeat=1 duration=68384ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=2 duration=67744ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=3 duration=66976ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=4 duration=67040ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=5 duration=66816ns After: ## Execution time, file=u4_all_gather_1x8.hlo repeat=1 duration=41216ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=2 duration=40960ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=3 duration=40960ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=4 duration=41056ns ## Execution time, file=u4_all_gather_1x8.hlo repeat=5 duration=40960ns ``` Measured on 8xH100 DGX. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- a3777523ffffbcc59da285544e3fb5575d098b9c by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Sub-byte collective normalization: support collectives with non-minor-most last dimension. Merging this change closes #32388 PiperOrigin-RevId: 820585923	2025-10-17 01:38:24 -07:00
Harsha H S	086937e138	PR #32678 : [ROCm] Use working sha256 for latest ROCm 7.0 docker image and fix test scripts Imported from GitHub PR https://github.com/openxla/xla/pull/32678 📝 Summary of Changes - Fix sha256 of docker image to ensure CI is not broken due to malformed image - Fix test scripts by passing ROCM_PATH to bazel sandbox via repo_env 🎯 Justification Continued CI runs 🚀 Kind of Contribution 🧪 Tests Copybara import of the project: -- 3ca8114613d8e002c137f28bb6608639d08a724a by Harsha Havanur Shamsundara <harsha.havanurshamsundara@amd.com>: [ROCm] Use working sha256 for latest ROCm 7.0 docker image -- 09ddfbdf205a6406cdd67e20671f41455fffe0f9 by Harsha HS <Harsha.HavanurShamsundara@amd.com>: [ROCm] Add ROCM_PATH repo_env to test scripts Merging this change closes #32678 PiperOrigin-RevId: 820582560	2025-10-17 01:25:06 -07:00
Shanbin Ke	f573329cc6	PR #32718 : [XLA:GPU] add conv fusion support in cudnn fusion compiler Imported from GitHub PR https://github.com/openxla/xla/pull/32718 📝 Summary of Changes This PR adds conv fusion support in cudnn fusion compiler. * add conv type in `CuDnnFusionConfig` to represent different types of conv. We are getting rid of the conv custom call target so this info has be preserved in fusion config. * add `ConvDimensionAdapter` to generate NCHW logical layout for cudnn frontend while physical layout could be NHWC (most preferable layout) or NCHW (for int conv). Only NHWC layout is used in the unit tests because layout assignment currently doesn't handle conv fusion to transform other layouts to NHWC, this needs to be addressed in separate PR. * add conv translation rule from XLA conv to cudnn frontend graph API. * Other parts of the lowering is taken care automatically by current cudnn fusion compiler: workspace allocation/graph validation/graph compilation/graph serialization. 🎯 Justification This is the first step to unify the conv as cudnn fusion in XLA. Conv custom call will be replaced with conv fusions in the future. 🚀 Kind of Contribution ✨ New Feature 📊 Benchmark (for Performance Improvements) No Performance changes are expected. 🧪 Unit Tests: Added 3 hand written NHWC conv unit tests for conv_fprop/conv_dgrad/conv_wgrad. 🧪 Execution Tests: Added 3 hand written NHWC conv unit tests for conv_fprop/conv_dgrad/conv_wgrad. Copybara import of the project: -- 57555cd0e3759aacb7a98135c3261f4cc3f642c2 by Cjkkkk <ske@nvidia.com>: init -- d6edecfa42a6371a0908e22daeb8deaf32998ece by Cjkkkk <ske@nvidia.com>: address comments -- 17df6f8451274f070d7d332a126cfefa1ef7df83 by Cjkkkk <ske@nvidia.com>: removed one comment -- 1b7c63b1ade7751cf8f68c7fb11cd68491440081 by Cjkkkk <ske@nvidia.com>: add const Merging this change closes #32718 PiperOrigin-RevId: 820574737	2025-10-17 00:58:07 -07:00
A. Unique TensorFlower	89045aa0a3	Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 820558317	2025-10-17 00:12:04 -07:00
TensorFlower Gardener	a3397645a1	Merge pull request #101517 from ILCSFNO:patch-1 PiperOrigin-RevId: 820555347	2025-10-16 23:59:21 -07:00
TensorFlower Gardener	cbdc771ee5	Merge pull request #101522 from ILCSFNO:patch-4 PiperOrigin-RevId: 820553874	2025-10-16 23:51:49 -07:00
A. Unique TensorFlower	f27d9d63d7	Automated Code Change PiperOrigin-RevId: 820551714	2025-10-16 23:40:53 -07:00
Jacques Pienaar	2096501975	Remove register everything. This should just be IR one. PiperOrigin-RevId: 820548236	2025-10-16 23:22:26 -07:00
A. Unique TensorFlower	1ddcd859d3	Move absl_thread_pool to XLA as YnnThreadpool PiperOrigin-RevId: 820544939	2025-10-16 23:13:24 -07:00
Christian Sigg	c9d8d37611	[xla:gpu] Relax nested gemm fusion constraints. This change removes dimension ordering constraints in `AcceptDotOperand`. PiperOrigin-RevId: 820542964	2025-10-16 23:02:42 -07:00
A. Unique TensorFlower	d46c1b99a9	Automated Code Change PiperOrigin-RevId: 820542824	2025-10-16 22:51:48 -07:00
Majid Dadashi	46f983d3ff	Enable lowering from FQ Composite for 2-bit This also adds an additional test for this lowering. PiperOrigin-RevId: 820534395	2025-10-16 22:16:26 -07:00
Gregory Pataky	c0d9a60f83	Internal changes to project structure PiperOrigin-RevId: 820527062	2025-10-16 21:52:14 -07:00
Penporn Koanantakool	b2f2568bcc	[xla:cpu:xnn] Temporarily disable XNNPACK by default. PiperOrigin-RevId: 820519075	2025-10-16 21:31:15 -07:00
A. Unique TensorFlower	a0e060ad78	Automated Code Change PiperOrigin-RevId: 820517563	2025-10-16 21:18:59 -07:00
Majid Dadashi	f67cb87691	Add support for int2/int4 in tfl.cast PiperOrigin-RevId: 820509011	2025-10-16 20:50:00 -07:00
A. Unique TensorFlower	5592d364ec	Automated Code Change PiperOrigin-RevId: 820505039	2025-10-16 20:36:41 -07:00
A. Unique TensorFlower	a8a747470e	Update XNNPACK in XLA PiperOrigin-RevId: 820502825	2025-10-16 20:24:07 -07:00
Eugene Zhulenev	ef3a678718	[xla:cpu] Fix BufferAllocationInfo::InOutParameter constructor PiperOrigin-RevId: 820456592	2025-10-16 17:49:08 -07:00

1 2 3 4 5 ...

186058 Commits