tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 12:20:11 +01:00

Author	SHA1	Message	Date
Parker Schuh	13ea97f3a9	Update PjRtStreamExecutorClient main execute path to use CommonPjRtBuffer::ScopedHold. Crucially this now passes reference_held=true always. This is fine because the only time this was ever passed as false was if this was already on the compute stream and this bool is basically ignored if the stream is the compute stream (see MaybeWaitForEventOnStream). PiperOrigin-RevId: 822758577	2025-10-22 15:35:02 -07:00
A. Unique TensorFlower	880f245b56	Allow TSL CellReader to work with lazy metrics. PiperOrigin-RevId: 822757884	2025-10-22 15:25:07 -07:00
Parker Schuh	420ca15b61	Promote check to connection close. PiperOrigin-RevId: 822746430	2025-10-22 14:52:22 -07:00
Eugene Zhulenev	2cdd8ff5ce	[xla:ffi] Keep FFI handler metadata with handler registration PiperOrigin-RevId: 822741325	2025-10-22 14:34:37 -07:00
Hyeontaek Lim	70111bb38f	Reverts `16064a6c08` PiperOrigin-RevId: 822724128	2025-10-22 14:04:41 -07:00
A. Unique TensorFlower	aeda5dabd4	[XLA] Handle nested while loops in CollectivePipeliner. This CL modifies the collective pipeliner to generate unique body and condition computations for newly generated while loop instructions. PiperOrigin-RevId: 822719229	2025-10-22 13:47:32 -07:00
Maxim Ermilov	7b277367dc	Remove inheritance of GpuComputeCapability from std::variant PiperOrigin-RevId: 822701900	2025-10-22 13:33:16 -07:00
Parker Schuh	a6889b6922	Switch to using CommonAsyncHostToDeviceTransferManager. PiperOrigin-RevId: 822701589	2025-10-22 13:21:45 -07:00
Matthias Guenther	6d1a7019f0	Fix issues in optimization patterns for `broadcast_in_dim` and `pad` ops. - Prioritize replacing `broadcast_in_dim` with `reshape` over merging nested `broadcast_in_dim` ops. The new behavior matches the relevant MHLO optimization behavior, which proved to be preferable. - Fix an issue where `pad` ops that didn't change the dimensions would be removed even if they shifted elements around within the tensor (e.g. padding by -1 on one side and +1 on the opposite side). PiperOrigin-RevId: 822701252	2025-10-22 13:11:10 -07:00
mmakevic-amd	a5524d43e6	PR #33008 : [ROCm] Add CI specific bazelrc file Imported from GitHub PR https://github.com/openxla/xla/pull/33008 📝 Summary of Changes Add CI-specific bazelrc that will import both `rocm.bazelrc` from `/usertools` and `rocm_xla.bazelrc` 🎯 Justification Temporary workaround until split logic in CI (which relies on `/usertools/rocm.bazelrc`) is removed Copybara import of the project: -- bb4cbf0c4fbf2c171110040c5c1470bddced203b by Milica Makevic <Milica.Makevic@amd.com>: Add CI specific bazelrc Merging this change closes #33008 PiperOrigin-RevId: 822700005	2025-10-22 12:50:14 -07:00
Zixuan Jiang	4d53eda2fe	Refactor spmd partitioner. PiperOrigin-RevId: 822689391	2025-10-22 12:23:05 -07:00
Maxim Ermilov	1b08f96abf	Port to new GpuComputeCapability API. Last part PiperOrigin-RevId: 822676102	2025-10-22 11:59:58 -07:00
Oleg Shyshkov	3503a61282	[XLA:GPU] Combine metadata AllToAlls in RaggedAllToAllMultiHostDecomposer. Instead of performing four separate AllToAll operations, the metadata tensors are reshaped, concatenated, and then a single AllToAll is executed. The result is then sliced back into the individual metadata tensors. This reduces latency required to initiate separate collective operations. PiperOrigin-RevId: 822674605	2025-10-22 11:49:53 -07:00
Ken Franko	85c99b1ecb	Reverts `2d4dd83773` PiperOrigin-RevId: 822637158	2025-10-22 10:17:06 -07:00
Eugene Zhulenev	4827802e7c	[xla:pjrt:ffi] Remove unused type id registration API PiperOrigin-RevId: 822630041	2025-10-22 10:01:45 -07:00
Will Froom	d8dcad1639	[XLA:CPU] Reenable new fusions in xla_ops_test. PiperOrigin-RevId: 822608974	2025-10-22 09:02:50 -07:00
Will Froom	3353eeeab7	[XLA:CPU] Only add reassoc flag to reductions with a single floating point op. PiperOrigin-RevId: 822598746	2025-10-22 08:33:14 -07:00
Dimitar (Mitko) Asenov	bbea04967a	Reverts `c28d80ae66` PiperOrigin-RevId: 822586242	2025-10-22 08:02:26 -07:00
Marcin Radomski	94d00be0e6	[XLA:GPU] Fix incorrect namespace in buffer_debug_log.* It was moved to stream_executor/gpu, but code remained in stream_executor::cuda namespace. PiperOrigin-RevId: 822584666	2025-10-22 07:51:36 -07:00
Oleg Shyshkov	53499fe9d0	[XLA:GPU] Move offset correction logic in a helper function. PiperOrigin-RevId: 822572708	2025-10-22 07:29:58 -07:00
Alexander Belyaev	a34be3eb68	[XLA:GPU] Ignore zero-sized constants in layout normalization. PiperOrigin-RevId: 822571991	2025-10-22 07:16:10 -07:00
A. Unique TensorFlower	39506ad1cd	Deduplicate functions on the one with largest number of call sites. Instead of picking arbitrarily. PiperOrigin-RevId: 822566069	2025-10-22 06:55:15 -07:00
Thomas Joerg	83b84b3c46	[XLA:GPU] Add tests for transpose ops inserted by DotDecomposer. Also be more precise about what is considered normal form and what is not. PiperOrigin-RevId: 822554350	2025-10-22 06:18:34 -07:00
Kostiantyn Liepieshov	b5d09010cd	Make adding missing shardings to control flow configurable in StableHLO export. Introduce `addMissingShardingToControlFlow` option in `StablehloExportPipelineOptions` to control whether `ExportStablehloShardingsPass` adds missing shardings to control flow ops. Disable this option in `mlir_to_hlo.cc` when converting MLIR to HLO. PiperOrigin-RevId: 822542288	2025-10-22 05:37:59 -07:00
A. Unique TensorFlower	3cc86433e3	Correctly set dnn_version in device_description when parsing from proto. Removing the setting from the other 2 places as it is no longer necessary. PiperOrigin-RevId: 822533070	2025-10-22 05:02:14 -07:00
A. Unique TensorFlower	dfea7bb9a7	Automated Code Change PiperOrigin-RevId: 822524939	2025-10-22 04:29:06 -07:00
A. Unique TensorFlower	e5e060f167	Refactor Dynamic_Update_Slice operator in preparation for porting to TFLM. PiperOrigin-RevId: 822494887	2025-10-22 02:46:40 -07:00
A. Unique TensorFlower	85eff6042f	Update GraphDef version to 2388. PiperOrigin-RevId: 822485942	2025-10-22 02:32:52 -07:00
A. Unique TensorFlower	8f8707055c	compat: Update forward compatibility horizon to 2025-10-22 PiperOrigin-RevId: 822485927	2025-10-22 02:22:32 -07:00
Chenhao Jiang	75fa34bbde	PR #32231 : Support forward conv with dilation and add basic heuristic for differ… Imported from GitHub PR https://github.com/openxla/xla/pull/32231 📝 Summary of Changes The changes enable native support for forward convolutions with window dilation in XLA's GPU backend. Previously, all dilated convolutions were treated as non-canonical and required explicit padding materialization. Now, forward convolutions with window dilation (but not base dilation) are preserved and handled natively by cuDNN, avoiding unnecessary padding overhead. 🎯 Justification Performance Problem: JAX shows 15-23x slower performance than PyTorch for dilated convolutions (33.5ms vs 1.4ms at dilation rate 2). This is because XLA materializes dilated convolutions as padded convolutions instead of using cuDNN's native support. Solution: Allow forward convolutions with window dilation to bypass padding materialization and use cuDNN's native dilated convolution kernels directly. 🚀 Kind of Contribution Performance Improvement 📊 Benchmark (for Performance Improvements) dilation 1: prev: 1.08 ms now: 1.07 ms dilation 2: prev: 25.79 ms now: 0.91 ms dilation 1024: prev: 26.24 ms now: 2.34 ms Copybara import of the project: -- b5a38df2ed4715b43fc8ca8d652005a35290d47e by Chenhao Jiang <chenhaoj@nvidia.com>: Support forward conv with dilation and add basic heuristic for differentiating forward/backward Merging this change closes #32231 PiperOrigin-RevId: 822482265	2025-10-22 02:03:50 -07:00
Jian Cai	95d3b6fe36	[XLA][Numerics][HLO Value Tracking] Handle original values in while loop fusible sinking pass This reconstructs the original value for while loops with a rewritten input/output shape during the pass. PiperOrigin-RevId: 822465131	2025-10-22 01:08:37 -07:00
Felix Wang	add51a87c3	[XLA:GPU] Update latency hiding scheduler cost models for B200/H100 FP8 matmul PiperOrigin-RevId: 822446122	2025-10-22 00:01:00 -07:00
A. Unique TensorFlower	ca2365df32	Make ApproxTopK Op don't fail with kMhloFrontendAttributes. PiperOrigin-RevId: 822427505	2025-10-21 22:51:17 -07:00
Majid Dadashi	64f382ac25	Add support for kTfLiteInt2 (srq) in tfl.fully_connected. PiperOrigin-RevId: 822405584	2025-10-21 21:33:00 -07:00
Parker Schuh	68ad2b30fa	Implement PjRtStreamExecutorRawBuffer::CopyTo in terms of raw buffers. PiperOrigin-RevId: 822345080	2025-10-21 17:58:31 -07:00
Haibo Huang	bdb268c5c5	Add helper functions to check PjRtPlatformId types. PiperOrigin-RevId: 822333726	2025-10-21 17:13:03 -07:00
Derek Murray	69079b7e0d	Add flag `enable_fatal_error_on_collective_abort`. PiperOrigin-RevId: 822315284	2025-10-21 16:29:26 -07:00
Eugene Zhulenev	90491b0a55	[xla:pjrt:ffi] Prepare for legacy type registration removal PiperOrigin-RevId: 822309311	2025-10-21 16:13:04 -07:00
Paul Ganssle	512611da80	Internal code migration PiperOrigin-RevId: 822300362	2025-10-21 15:34:56 -07:00
Haibo Huang	b7d9295b52	Replace `ComputationOrigin` with the more general `PjRtDeviceDimensions` PiperOrigin-RevId: 822288293	2025-10-21 15:11:47 -07:00
Olli Lupton	3cdcb03f18	PR #32838 : Fix family-conditional logic Imported from GitHub PR https://github.com/openxla/xla/pull/32838 📝 Summary of Changes The fallback logic now correctly identifies the highest known compatible architecture when given an unknown architecture as input. 🎯 Justification Previously the logic would propose an incompatible architecture in this case. 🚀 Kind of Contribution 🐛 Bug Fix 🧪 Unit Tests: Added a new test case showing the previously-failing case (it used to propose `sm_110`) Copybara import of the project: -- f060bb9837d72159343ff2d52f5f2f42b1b7e9a4 by Olli Lupton <olupton@nvidia.com>: Fix family-conditional logic -- fc44dcd1e76da67c0b6fe53c33d2a571c3a6ff50 by Olli Lupton <olupton@nvidia.com>: Accept CR suggestion Merging this change closes #32838 PiperOrigin-RevId: 822284790	2025-10-21 14:59:18 -07:00
Eugene Zhulenev	0fc052399b	[xla:cpu] Fix data race in ThunkExecutor Also add tsl::down_pointer_cast to improve usability. PiperOrigin-RevId: 822257137	2025-10-21 13:46:24 -07:00
Michael Whittaker	5776d2771c	Pipe incarnations to `jax.live_devices`. PiperOrigin-RevId: 822250955	2025-10-21 13:35:27 -07:00
mmakevic-amd	47cd01d4a5	PR #32960 : [ROCm] Refactor testing scripts Imported from GitHub PR https://github.com/openxla/xla/pull/32960 📝 Summary of Changes (Partially) upstreaming changes from: https://github.com/ROCm/xla/pull/323, `9d358b9b26`, and https://github.com/ROCm/xla/pull/385. It skips some asan/tsan changes for now. 🎯 Justification These changes are ROCm specific and helps with rocm internal CI validation pipelines. 🚀 Kind of Contribution 🐛 Bug Fix, ♻️ Cleanup, 🧪 Tests 📊 Benchmark (for Performance Improvements) / 🧪 Unit Tests: / 🧪 Execution Tests: / Copybara import of the project: -- 804ff1b6a6fbba86a3e0a09d739179a4eb4f197d by Milica Makevic <Milica.Makevic@amd.com>: Add missing cuda-only tag to cuda test -- 44ce7a2d56c9f0c80405447f431ae1e5a33f42e1 by Milica Makevic <Milica.Makevic@amd.com>: Refactor test scripts -- fb783c968e9d2ff5d92357908d99e4952235c2bc by Milica Makevic <Milica.Makevic@amd.com>: Cover more mgpu tests -- 1f53712274f76202241bd3631dbf065826c0b960 by Milica Makevic <Milica.Makevic@amd.com>: Switch from rocm_gcc to rocm_ci for sgpu tests -- 00e0c8ee2a763680f5a3665dab62202ab230731d by Milica Makevic <Milica.Makevic@amd.com>: Changing file permissions -- 003c062a8900c12b73c0972e8d406f2661a27aba by Milica Makevic <Milica.Makevic@amd.com>: Remove unnecessary import -- 214599355f40f1b65e0540daf0b9829d2c950115 by Harsha HS <Harsha.HavanurShamsundara@amd.com>: Add license header Merging this change closes #32960 PiperOrigin-RevId: 822245565	2025-10-21 13:25:33 -07:00
Eugene Zhulenev	7a107e3571	[xla:ffi] Rename FFI_TypeID_Register API PiperOrigin-RevId: 822240093	2025-10-21 13:12:40 -07:00
Felix Wang	95f3e6f33c	[XLA:GPU]: Refactor the unit test of matmul_interpolator_test.cc to prepare for adding the mix-precision fp8 unit test. PiperOrigin-RevId: 822239646	2025-10-21 13:02:53 -07:00
Felix Wang	2de2bb8581	Populate the cost for async collective in both async-start and the computation root op. PiperOrigin-RevId: 822223031	2025-10-21 12:22:08 -07:00
Eugene Zhulenev	633c3efcf9	[xla:cpu] Delete unused cpu_function_runtime header PiperOrigin-RevId: 822215543	2025-10-21 12:15:40 -07:00
Eugene Zhulenev	6141496817	[xla:ffi] Document XLA:FFI binary API guarantees and add a supporteded API range check PiperOrigin-RevId: 822214561	2025-10-21 12:02:12 -07:00
Majid Dadashi	c37a4aaa58	Add support for kTfLiteInt2 to Dequantize kernels. This change enables the Dequantize and PerChannelDequantize operations to handle 2-bit integer inputs (`kTfLiteInt2`). It includes logic to unpack the packed 2-bit integers into int8_t before performing the dequantization and adds new test cases for both per-tensor and per-channel dequantization with kTfLiteInt2. PiperOrigin-RevId: 822207279	2025-10-21 11:49:46 -07:00

1 2 3 4 5 ...

186163 Commits