tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

Author	SHA1	Message	Date
Henning Becker	26d0882419	Add proto serialization for GpuExecutable This is adding `GpuExecutuable::ToProto` and `GpuExecutable::FromProto` which allow us to [de]serialize an instance of `GpuExecutable` and later reconstruct it. PiperOrigin-RevId: 826470601	2025-10-31 07:07:33 -07:00
A. Unique TensorFlower	f73a954906	Add SymbolicExpr::IsBinaryOp() method This CL introduces a new helper method SymbolicExpr::IsBinaryOp() to quickly determine if a SymbolicExpr is a binary operation (i.e., not a constant or a variable). This is used in indexing_map.cc in several places for AffineMap and it will simplify the refactor. PiperOrigin-RevId: 826468454	2025-10-31 06:54:52 -07:00
Marcin Radomski	718fe5695e	[XLA:GPU] Add flags for filtering debugged thunks Checking all buffers is way too heavy and causes timeouts, so we need the ability to focus on interesting parts of the thunk graph. `--xla_gpu_experimental_thunk_buffer_debug_filter_by_thunk_id_ranges` allows limiting thunk IDs to selected ranges or values. The IDs are assigned in the order of emitting thunks, which should (TM) be stable and allow bisecting to find culprit thunk(s). The IDs are given as comma-separated list of integers, closed or half-open ranges (e.g. `:2,5,7:8,12:` to match <=2, 5, 7, 8 and >=12). `--xla_gpu_experimental_thunk_buffer_debug_filter_by_profile_annotation_re` allows matching by thunk's profile annotation. This is a comma-separated list of regexes that will be matched against `ThunkInfo::profile_annotation`. The thunk's profile annotation needs to match any of the regexes. They are meant to work with all thunk debug buffer instrumentation (currently: checksums, NaNs). If both flags are defined, the thunk will have to pass both the ID and profile annotation filters to get instrumented. Implementation of the filtering logic is not included in this CL. PiperOrigin-RevId: 826457166	2025-10-31 06:11:25 -07:00
A. Unique TensorFlower	4d78e8088a	Automated Code Change PiperOrigin-RevId: 826451988	2025-10-31 05:59:56 -07:00
A. Unique TensorFlower	fe344908fa	Automated Code Change PiperOrigin-RevId: 826451035	2025-10-31 05:48:15 -07:00
Kanish Anand	e7dcad735e	Add equality operator for `NamedSharding` PiperOrigin-RevId: 826442714	2025-10-31 05:16:43 -07:00
Aliia Khasanova	add489fd8d	Use `std::vector<BufferAllocation>` instead of `std::vector<std::unique_ptr<BufferAllocation>>` in DynamicSliceThunk. `BufferAllocation::Slice` stores a raw pointer to the corresponding `BufferAllocation`. Now we keep the embedded thunk allocations alive by stroing unique_ptrs in the wrapping DynamicSliceThunk. The current design makes it hard to reuse the existing infrastructure, specifically to serialize `DynamicSliceThunk`. To address this, I'm changing fake_allocations to be `std::vector<BufferAllocation>`. The move constructor `std::vector::vector(std::vector&&)` is guaranteed to have constant time complexity and therefore it steals the internal data buffer from the source vector. This infers that the pointers to allocations are kept stable as long as: * we preallocate the vector size * we never copy the vector, but move To make it safer for later usage, we can explicitely prohibid BufferAllocation to be copyable/moveable. I'm going to do this in the following cl. PiperOrigin-RevId: 826440060	2025-10-31 05:05:43 -07:00
A. Unique TensorFlower	3326b0221f	Automated Code Change PiperOrigin-RevId: 826433106	2025-10-31 04:44:23 -07:00
A. Unique TensorFlower	e32304ddc5	[Autotuner]Add support for sharded autotuning in the pass. PiperOrigin-RevId: 826417614	2025-10-31 03:50:55 -07:00
Eusebio Durán Montaña	e32f20dd91	Use factory function to create `CubSortThunk` The `CubSortThunk` constructor was calling a function that returns a `absl::StatusOr`, and ignoring non-ok statuses and just accessing the value. Presumably in prod the status is always ok, but making this failure case explicit. PiperOrigin-RevId: 826410861	2025-10-31 03:37:19 -07:00
Kanish Anand	adfd891fde	Refactor Mesh ctor's PiperOrigin-RevId: 826410314	2025-10-31 03:26:09 -07:00
A. Unique TensorFlower	8734ec41d5	Disable capturing of dot RHS operands This is proving to be unreliable. PiperOrigin-RevId: 826395008	2025-10-31 02:45:34 -07:00
A. Unique TensorFlower	d6d4e02248	[XLA:GPU] Add multimem setup. PiperOrigin-RevId: 826391581	2025-10-31 02:35:40 -07:00
A. Unique TensorFlower	a6e96588e3	compat: Update forward compatibility horizon to 2025-10-31 PiperOrigin-RevId: 826388828	2025-10-31 02:29:46 -07:00
A. Unique TensorFlower	f572aeee90	Update GraphDef version to 2397. PiperOrigin-RevId: 826388642	2025-10-31 02:16:45 -07:00
A. Unique TensorFlower	993369077a	Reverts `bf23bf1b32` PiperOrigin-RevId: 826380939	2025-10-31 01:51:12 -07:00
A. Unique TensorFlower	d25ccb438d	Reverts `cef240807a` PiperOrigin-RevId: 826374657	2025-10-31 01:32:52 -07:00
A. Unique TensorFlower	4cfaa7e25c	Automated Code Change PiperOrigin-RevId: 826372270	2025-10-31 01:19:21 -07:00
A. Unique TensorFlower	5133f83425	Automated Code Change PiperOrigin-RevId: 826363842	2025-10-31 00:50:37 -07:00
A. Unique TensorFlower	ebacf2a211	Automated Code Change PiperOrigin-RevId: 826342599	2025-10-30 23:59:59 -07:00
Bill Varcho	cef240807a	[ReplicaGroupV3][MeshAxesReplicaGroupList][1/2] Add initial class definition for V3 replica group. PiperOrigin-RevId: 826334561	2025-10-30 23:18:40 -07:00
Felix Wang	d9c76aafeb	Adjust the collective-permute cross host type to `MULTI_HOST_NON_WORLD_LEVEL` only. PiperOrigin-RevId: 826327580	2025-10-30 22:54:49 -07:00
Eugene Zhulenev	d90723f48e	[xla:pjrt:cpu] Add e2e test for YnnFusion + PJRT client PiperOrigin-RevId: 826323865	2025-10-30 22:41:49 -07:00
Eugene Zhulenev	7ad55e8818	[xla:cpu] Add an end-to-end test for ynn fusions PiperOrigin-RevId: 826318525	2025-10-30 22:20:44 -07:00
Eugene Zhulenev	bf23bf1b32	[xla:cpu] Pass HloModule pointer to Thunk SerDes PiperOrigin-RevId: 826312546	2025-10-30 22:11:41 -07:00
Eugene Zhulenev	56d3b19280	[xla:cpu] NFC: Rename protos for Xnn/Ynn fusion options PiperOrigin-RevId: 826304955	2025-10-30 22:01:47 -07:00
A. Unique TensorFlower	a95c558dc4	Save compile options with the compiled IFRT IR program to be used later for serialization PiperOrigin-RevId: 826301016	2025-10-30 21:54:24 -07:00
A. Unique TensorFlower	e61bac51b1	Automated Code Change PiperOrigin-RevId: 826298597	2025-10-30 21:47:00 -07:00
A. Unique TensorFlower	b2334ac330	Integrate LLVM at llvm/llvm-project@22079e3f36 Updates LLVM usage to match [22079e3f3698](https://github.com/llvm/llvm-project/commit/22079e3f3698) PiperOrigin-RevId: 826294004	2025-10-30 20:44:41 -07:00
A. Unique TensorFlower	6d86cff5f3	Automated Code Change PiperOrigin-RevId: 826286610	2025-10-30 20:12:04 -07:00
Eugene Zhulenev	db273660ba	[xla:pjrt] Remove PjRtFuture type alias Cleaning up BUILD files and includes will be done separately. PiperOrigin-RevId: 826280389	2025-10-30 19:44:40 -07:00
Eugene Zhulenev	429a0cf1c7	[xla:cpu] Add target machine features to the error message PiperOrigin-RevId: 826253599	2025-10-30 17:49:12 -07:00
Eugene Zhulenev	d9024af6d4	[xla:cpu] Do not register legacy runtime symbols with XLA:CPU custom calls PiperOrigin-RevId: 826208548	2025-10-30 16:25:55 -07:00
Niklas Vangerow	31bb7c01ff	Migrate multioutput_fusion_test to use PjRt. PiperOrigin-RevId: 826203532	2025-10-30 15:18:22 -07:00
Parker Schuh	c3d0bf7023	Add additional way to poision a connection (to allow testing different poisoning strategies). PiperOrigin-RevId: 826193232	2025-10-30 14:52:58 -07:00
A. Unique TensorFlower	c40bb10b96	Add the option to dump before/after autotuned instructions in AutotunerConfig. - This change is required to still support the functionality of xla_gpu_dump_autotuned_gemm_fusions in the new infra. PiperOrigin-RevId: 826161466	2025-10-30 14:39:24 -07:00
A. Unique TensorFlower	8f60516a86	Refactor: Move common SymbolicMapTest setup to the fixture. This change moves the initialization of commonly used `SymbolicExpr` and a sample `SymbolicMap` into the `SymbolicMapTest` fixture to reduce code duplication across tests. PiperOrigin-RevId: 826161168	2025-10-30 14:19:16 -07:00
A. Unique TensorFlower	7736af79a6	Only enable YNNPACK for bf16 and int8 for now. We plan to enable this in stages, starting with int8 and bf16, where the improvement is more significant. PiperOrigin-RevId: 826160602	2025-10-30 14:05:02 -07:00
Karlo Basioli	f4ebf9d47d	[XLA][codegen] Migrate triton operations that have shared dialect lowerings are implemented for. These were missed in previous commits. Addresses transpose and bitcast. PiperOrigin-RevId: 826158776	2025-10-30 13:54:31 -07:00
Niklas Vangerow	1424c4f739	Migrate slice_test to use PjRt. PiperOrigin-RevId: 826158235	2025-10-30 13:45:44 -07:00
Karlo Basioli	5973848600	[XLA][codegen] Emit shlo reshape from the fusion emitter and lower it to triton for the triton backend. PiperOrigin-RevId: 826147865	2025-10-30 13:32:56 -07:00
Quoc Truong	f01a7fea8c	Update ML Build Docker container to use hermetic C++ PiperOrigin-RevId: 826147864	2025-10-30 13:25:44 -07:00
A. Unique TensorFlower	7e7b1a3015	Allow empty dimension list in SymbolicMap::ReplaceDimsAndSymbols I originally assumed the caller was always providing a full list of replacements but IndexingMap have some uses where the dim_replacement list is empty, resulting in a CHECK-fail. So, I'm allowing the user to provide either dim or symbol empty lists to ReplaceDimsAndSymbols. In that case, the dims/symbols won't be replaced. PiperOrigin-RevId: 826138814	2025-10-30 13:18:28 -07:00
Niklas Vangerow	175774337e	Migrate params_test to use PjRt. PiperOrigin-RevId: 826137636	2025-10-30 13:07:24 -07:00
Daniel Sosa	5dcb571931	Remove unused code from convert.py PiperOrigin-RevId: 826137491	2025-10-30 12:59:13 -07:00
Zixuan Jiang	146c4f56b7	Clear frontend attributes for get-tuple-elements of GlobalToLocal and LocalToGlobal custom-calls. The GlobalToLocal and LocalToGlobal custom-calls are for Shardy round trip. These get-tuple-elements will be removed when we import the Shardy dialect and thus they do not need to hold frontend attributes. This can reduce the size of the generated HLO module text. PiperOrigin-RevId: 826134489	2025-10-30 12:43:45 -07:00
William S. Moses	a94890b1f9	Improve CUDNN error messages PiperOrigin-RevId: 826124080	2025-10-30 12:34:16 -07:00
Oleg Shyshkov	9eeebc9be5	[XLA:GPU] Use a single intra-host ragged-all-to-all in the decomposition. Instead of 2 ra2a + concat, we can double the output buffer and adjust output offsets. This way we can save on latency by having only one multi-GPU synchronization. PiperOrigin-RevId: 826122665	2025-10-30 12:24:20 -07:00
Eugene Zhulenev	9b51864c7b	[xla:ffi] Add example of async custom call in XLA:GPU PiperOrigin-RevId: 826121283	2025-10-30 12:11:20 -07:00
Niklas Vangerow	061041963e	Migrate map_test to use PjRt. PiperOrigin-RevId: 826107887	2025-10-30 11:52:36 -07:00

1 2 3 4 5 ...

186482 Commits