tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

Author	SHA1	Message	Date
A. Unique TensorFlower	ea390babb6	Update GraphDef version to 2398. PiperOrigin-RevId: 826794562	2025-11-01 02:14:13 -07:00
A. Unique TensorFlower	ef28899305	Automated Code Change PiperOrigin-RevId: 826749442	2025-10-31 22:39:01 -07:00
A. Unique TensorFlower	aa4db17b00	Automated Code Change PiperOrigin-RevId: 826726802	2025-10-31 20:43:10 -07:00
A. Unique TensorFlower	09f85795ba	Automated Code Change PiperOrigin-RevId: 826725010	2025-10-31 20:30:14 -07:00
A. Unique TensorFlower	4618f903c4	Reverts `bec8916f32` PiperOrigin-RevId: 826722506	2025-10-31 20:19:17 -07:00
Eugene Zhulenev	752a654e9e	[jax:ffi] Declare ffi::TypeInfo as a struct static member PiperOrigin-RevId: 826717163	2025-10-31 20:00:24 -07:00
Terry Sun	8134117476	PR #32836 : [GPU] Dispatch S-curve model to single-partition multi-host topology Imported from GitHub PR https://github.com/openxla/xla/pull/32836 📝 Summary of Changes Updated SINGLE_HOST communication type to SINGLE_PARTITION (fast-interconnect domain) to meet the need of multi-node NVLink (MNNVL) topology. Piped auto-detected partition size for communication type determination, also exposed partition size in SolGPUCostModel::Config for AOT compilation. 🎯 Justification S-curve model cannot handle NVLink latency, single fast-interconnect domain including MNNVL topology should use latency table model. This PR updates the routing mechanism so that MNNVL will be treated as a single partition, while previously host is assumed equivalent to partition. 🚀 Kind of Contribution ✨ New Feature 📊 Benchmark (for Performance Improvements) N/A 🧪 Unit Tests: Added unit tests for model dispatching mechanism. 🧪 Execution Tests: Behavior unchanged for non-MNNVL topology, N/A. Copybara import of the project: -- a9544375934873f7b888fdb5ff6c9dc6ee8b0e6c by Terry Sun <tesun@nvidia.com>: use partition size for static model dispatching -- e3445a5deb8da10146e90c50da5598f91cfe0a69 by Terry Sun <tesun@nvidia.com>: expose partition size to config -- 212535ce891b8eb96ebb3c1e215a91d2b5035594 by Terry Sun <tesun@nvidia.com>: better modularity -- a9fe8a0f89dea9e2811d76a3570c7398df8dd756 by Terry Sun <tesun@nvidia.com>: better code structure and doc string -- a64a2b5ed1d45d815c6a2c47628b4d9ebb8368bd by Terry Sun <tesun@nvidia.com>: update naming Merging this change closes #32836 PiperOrigin-RevId: 826697791	2025-10-31 18:28:25 -07:00
Eugene Zhulenev	dad4fb74cd	[xla:ffi] Remove deprecated TypeInfo constructor and replace it with XLA_FFI_TypeInfo alias PiperOrigin-RevId: 826692285	2025-10-31 18:05:12 -07:00
Eugene Zhulenev	fbd032df67	[xla:cpu] Pass HloModule pointer to Thunk SerDes Reverts `993369077a` PiperOrigin-RevId: 826675119	2025-10-31 17:12:35 -07:00
Penporn Koanantakool	00be2bc09e	[xla:cpu:onednn] Skip failing tests on Aarch64 CPUs. PiperOrigin-RevId: 826675056	2025-10-31 16:58:45 -07:00
Eugene Zhulenev	0b5bc94a83	[xla:ffi] Migrate to xla::ffi::MakeTypeInfo() API PiperOrigin-RevId: 826667736	2025-10-31 16:42:38 -07:00
Jie Luo	23f7b26bc5	Remove deprecated float_format/double_format in python proto text_format. PiperOrigin-RevId: 826666522	2025-10-31 16:30:05 -07:00
A. Unique TensorFlower	15e235f79b	Allow IFRT-proxy to expand error-status payloads that are specific to Pathways. PiperOrigin-RevId: 826656416	2025-10-31 15:58:26 -07:00
Sevin Fide Varoglu	c655468288	PR #31375 : [XLA:GPU] Add NVLink domain check to CollectiveBackendAssigner Imported from GitHub PR https://github.com/openxla/xla/pull/31375 📝 Summary of Changes This PR updates the CollectiveBackendAssigner pass to account for NVLink domain connectivity when deciding between NVSHMEM and DEFAULT backends. It does this by adding a slice_size parameter to the compilation pipeline and introducing an IsIntraNVLinkDomain check. 🎯 Justification The CollectiveBackendAssigner now uses NVSHMEM not only for single-host scenarios, but also when all devices are within the same NVLink domain. 🚀 Kind of Contribution ⚡️ Performance Improvement, 🧪 Tests 📊 Benchmark (for Performance Improvements) H100 \| \| NVSHMEM enabled \| NVSHMEM disabled \| \|----------\|----------\|----------\| \| llama31_8b_fp8_1x8 \| 1095330 us \| 1093816 us \| \| llama31_8b_bf16_2x8 \| 1368948 us \| 1370896 us \| \| llama31_8b_fp8_2x8 \| 1096447 us \| 1092437 us \| \| llama31_70b_fp8_16x8 \| 9723821 us \| 9707544 us \| 🧪 Unit Tests: Added unit tests to xla/service/gpu/transforms/collectives/collective_backend_assigner_test.cc 🧪 Execution Tests: Tested with llama3-8b on 2 GB200 nodes (fsdp = 8). The average step time in NVSHMEM case was 3.69s (vs. 3.76s in the default case). Copybara import of the project: -- a02b77cec9622314af01ae481d0fb28b149f1b45 by Sevin Varoglu <svaroglu@nvidia.com>: Add NVLink domain check to CollectiveBackendAssigner Merging this change closes #31375 PiperOrigin-RevId: 826649437	2025-10-31 15:48:52 -07:00
Zixuan Jiang	bf84442f21	Refactor mesh and axis representation. PiperOrigin-RevId: 826647907	2025-10-31 15:36:42 -07:00
Jian Cai	9c620f90b8	[XLA][Numerics][HLO Original Value] Support original values for more cases in while loop simplifier pass This updates the original value of a while loop if its input was nested tuples and got flatten during the pass PiperOrigin-RevId: 826644894	2025-10-31 15:23:52 -07:00
A. Unique TensorFlower	a3f8740bc7	Update tflite schema to allow external buffer PiperOrigin-RevId: 826640205	2025-10-31 15:07:53 -07:00
A. Unique TensorFlower	80048022c7	Update XNNPACK in XLA PiperOrigin-RevId: 826626298	2025-10-31 14:36:06 -07:00
Bill Varcho	261e077984	[ReplicaGroupV3][MeshAxesReplicaGroupList][2/2] Add `flattened_replica_groups` function for MeshAxesReplicaGroupList. PiperOrigin-RevId: 826619318	2025-10-31 14:19:13 -07:00
Jian Cai	a6e123761d	[XLA][Numerics][HLO Original Values] Handles original values of while loops in TPU reduce code motion pass This updates the original value of a while loop after its input/output shape gets changed after the pass sinks qualified reduce instructions into its body. PiperOrigin-RevId: 826618908	2025-10-31 14:05:49 -07:00
Parker Schuh	eef0661fc5	Rollforward with fixes of "Change RawSEDeviceMemory to be AsyncValueRef". Reverts `c7055c2e5b` PiperOrigin-RevId: 826608975	2025-10-31 13:39:01 -07:00
Bill Varcho	d008dc3999	Reverts `d25ccb438d` PiperOrigin-RevId: 826583855	2025-10-31 12:35:26 -07:00
Haibo Huang	8572aaa4e9	Unify topology in PjRtTopologyDescription The topology on pjrt layer can be seen as: (process, chip, logical device) or (process, chip, core) For cpu, it is (1, num device, 1) For gpu, it is (num host, gpu per host, 1) PiperOrigin-RevId: 826581627	2025-10-31 12:22:41 -07:00
A. Unique TensorFlower	e0f6a6c7f3	Integrate LLVM at llvm/llvm-project@42a8ff877d Updates LLVM usage to match [42a8ff877d47](https://github.com/llvm/llvm-project/commit/42a8ff877d47) PiperOrigin-RevId: 826574010	2025-10-31 12:01:33 -07:00
Aliia Khasanova	6ff7f9c87f	Add de/serializaton of `fake_allocations` in `DynamicSliceThunk`. PiperOrigin-RevId: 826541399	2025-10-31 10:36:21 -07:00
Eusebio Durán Montaña	ecc2510eb0	Use Deserializer lambda for embedded thunks in `DynamicSliceThunk` PiperOrigin-RevId: 826474606	2025-10-31 07:20:46 -07:00
Henning Becker	26d0882419	Add proto serialization for GpuExecutable This is adding `GpuExecutuable::ToProto` and `GpuExecutable::FromProto` which allow us to [de]serialize an instance of `GpuExecutable` and later reconstruct it. PiperOrigin-RevId: 826470601	2025-10-31 07:07:33 -07:00
A. Unique TensorFlower	f73a954906	Add SymbolicExpr::IsBinaryOp() method This CL introduces a new helper method SymbolicExpr::IsBinaryOp() to quickly determine if a SymbolicExpr is a binary operation (i.e., not a constant or a variable). This is used in indexing_map.cc in several places for AffineMap and it will simplify the refactor. PiperOrigin-RevId: 826468454	2025-10-31 06:54:52 -07:00
Marcin Radomski	718fe5695e	[XLA:GPU] Add flags for filtering debugged thunks Checking all buffers is way too heavy and causes timeouts, so we need the ability to focus on interesting parts of the thunk graph. `--xla_gpu_experimental_thunk_buffer_debug_filter_by_thunk_id_ranges` allows limiting thunk IDs to selected ranges or values. The IDs are assigned in the order of emitting thunks, which should (TM) be stable and allow bisecting to find culprit thunk(s). The IDs are given as comma-separated list of integers, closed or half-open ranges (e.g. `:2,5,7:8,12:` to match <=2, 5, 7, 8 and >=12). `--xla_gpu_experimental_thunk_buffer_debug_filter_by_profile_annotation_re` allows matching by thunk's profile annotation. This is a comma-separated list of regexes that will be matched against `ThunkInfo::profile_annotation`. The thunk's profile annotation needs to match any of the regexes. They are meant to work with all thunk debug buffer instrumentation (currently: checksums, NaNs). If both flags are defined, the thunk will have to pass both the ID and profile annotation filters to get instrumented. Implementation of the filtering logic is not included in this CL. PiperOrigin-RevId: 826457166	2025-10-31 06:11:25 -07:00
A. Unique TensorFlower	4d78e8088a	Automated Code Change PiperOrigin-RevId: 826451988	2025-10-31 05:59:56 -07:00
A. Unique TensorFlower	fe344908fa	Automated Code Change PiperOrigin-RevId: 826451035	2025-10-31 05:48:15 -07:00
Kanish Anand	e7dcad735e	Add equality operator for `NamedSharding` PiperOrigin-RevId: 826442714	2025-10-31 05:16:43 -07:00
Aliia Khasanova	add489fd8d	Use `std::vector<BufferAllocation>` instead of `std::vector<std::unique_ptr<BufferAllocation>>` in DynamicSliceThunk. `BufferAllocation::Slice` stores a raw pointer to the corresponding `BufferAllocation`. Now we keep the embedded thunk allocations alive by stroing unique_ptrs in the wrapping DynamicSliceThunk. The current design makes it hard to reuse the existing infrastructure, specifically to serialize `DynamicSliceThunk`. To address this, I'm changing fake_allocations to be `std::vector<BufferAllocation>`. The move constructor `std::vector::vector(std::vector&&)` is guaranteed to have constant time complexity and therefore it steals the internal data buffer from the source vector. This infers that the pointers to allocations are kept stable as long as: * we preallocate the vector size * we never copy the vector, but move To make it safer for later usage, we can explicitely prohibid BufferAllocation to be copyable/moveable. I'm going to do this in the following cl. PiperOrigin-RevId: 826440060	2025-10-31 05:05:43 -07:00
A. Unique TensorFlower	3326b0221f	Automated Code Change PiperOrigin-RevId: 826433106	2025-10-31 04:44:23 -07:00
A. Unique TensorFlower	e32304ddc5	[Autotuner]Add support for sharded autotuning in the pass. PiperOrigin-RevId: 826417614	2025-10-31 03:50:55 -07:00
Eusebio Durán Montaña	e32f20dd91	Use factory function to create `CubSortThunk` The `CubSortThunk` constructor was calling a function that returns a `absl::StatusOr`, and ignoring non-ok statuses and just accessing the value. Presumably in prod the status is always ok, but making this failure case explicit. PiperOrigin-RevId: 826410861	2025-10-31 03:37:19 -07:00
Kanish Anand	adfd891fde	Refactor Mesh ctor's PiperOrigin-RevId: 826410314	2025-10-31 03:26:09 -07:00
A. Unique TensorFlower	8734ec41d5	Disable capturing of dot RHS operands This is proving to be unreliable. PiperOrigin-RevId: 826395008	2025-10-31 02:45:34 -07:00
A. Unique TensorFlower	d6d4e02248	[XLA:GPU] Add multimem setup. PiperOrigin-RevId: 826391581	2025-10-31 02:35:40 -07:00
A. Unique TensorFlower	a6e96588e3	compat: Update forward compatibility horizon to 2025-10-31 PiperOrigin-RevId: 826388828	2025-10-31 02:29:46 -07:00
A. Unique TensorFlower	f572aeee90	Update GraphDef version to 2397. PiperOrigin-RevId: 826388642	2025-10-31 02:16:45 -07:00
A. Unique TensorFlower	993369077a	Reverts `bf23bf1b32` PiperOrigin-RevId: 826380939	2025-10-31 01:51:12 -07:00
A. Unique TensorFlower	d25ccb438d	Reverts `cef240807a` PiperOrigin-RevId: 826374657	2025-10-31 01:32:52 -07:00
A. Unique TensorFlower	4cfaa7e25c	Automated Code Change PiperOrigin-RevId: 826372270	2025-10-31 01:19:21 -07:00
A. Unique TensorFlower	5133f83425	Automated Code Change PiperOrigin-RevId: 826363842	2025-10-31 00:50:37 -07:00
A. Unique TensorFlower	ebacf2a211	Automated Code Change PiperOrigin-RevId: 826342599	2025-10-30 23:59:59 -07:00
Bill Varcho	cef240807a	[ReplicaGroupV3][MeshAxesReplicaGroupList][1/2] Add initial class definition for V3 replica group. PiperOrigin-RevId: 826334561	2025-10-30 23:18:40 -07:00
Felix Wang	d9c76aafeb	Adjust the collective-permute cross host type to `MULTI_HOST_NON_WORLD_LEVEL` only. PiperOrigin-RevId: 826327580	2025-10-30 22:54:49 -07:00
Eugene Zhulenev	d90723f48e	[xla:pjrt:cpu] Add e2e test for YnnFusion + PJRT client PiperOrigin-RevId: 826323865	2025-10-30 22:41:49 -07:00
Eugene Zhulenev	7ad55e8818	[xla:cpu] Add an end-to-end test for ynn fusions PiperOrigin-RevId: 826318525	2025-10-30 22:20:44 -07:00

1 2 3 4 5 ...

186508 Commits