tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-07 00:20:20 +01:00

Author	SHA1	Message	Date
Gunhyun Park	d2759cb154	Fix typo `invompatible` -> `incompatible` PiperOrigin-RevId: 821816332	2025-10-20 15:16:51 -07:00
Sean Talts	8d940b9cc7	[XLA:CPU] Refactor: intrinsic::Type into its own library, pull out naming functions. PiperOrigin-RevId: 821815511	2025-10-20 15:06:06 -07:00
Hyeontaek Lim	b915d3103a	[NanoRt] NanoRt IFRT now returns a `nullptr` if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before. Future note: At some point, `NanoArray` would need to distinguish between a default layout vs. a concrete layout that is equal to the default layout. If the latter is used, `NanoArray::pjrt_layout()` is expected to return the concrete layout. This is not required by IFRT API semantics yet, but it will be enforced later in the future. PiperOrigin-RevId: 821808592	2025-10-20 14:51:22 -07:00
Quentin Khan	dc12ec4556	Don't name the handle when mapping the XNNPack cache on Windows. When specifying a mapping name to `CreateFileMappingA()`, that function returns previous mappings that match the same name disregarding the newly requested mapping size. This doesn't work well with the weight cache that is built (and mapped) incrementally. By making the mapping objects anonymous, we ensure that the mapping returned will have the requested size. Note: this doesn't increase the totally memory used by the process but the accounting by the Windows system is different. Compared to a fix that allocates memory instead of mapping the file, less memory is committed, and private and more is shareable. Testing `litert_llm_main` on [Gemma3-1B-IT] on Windows 11. \| Fix \| Commit (KB) \| Working Set (KB) \| Shareable (KB) \| Private (KB) \| \| ---------: \| -----------:\| ----------------:\| --------------:\| ------------:\| \| Anon. map \| 1 208 416 \| 1 678 396 \| 1 079 620 \| 599 096 \| \| Mem. alloc \| 1 705 620 \| 1 678 572 \| 582 428 \| 1 096 144 \| \| \| \| \| \| \| \| diff. \| +497 204 \| 176 \| -497 192 \| +497 048 \| [Gemma3-1B-IT]: https://huggingface.co/litert-community/Gemma3-1B-IT/blob/main/gemma3-1b-it-int4.litertlm PiperOrigin-RevId: 821807004	2025-10-20 14:36:54 -07:00
Benjamin Chetioui	88e4cd4d01	[XLA] Use `TiledHloSchedule` in `SymbolicTileAnalysis`. We can now produce arbitrary iteration patterns for output tiles, simply by parametrizing calls to `ComputeTiledHloInstructions` with different `TiledHloSchedule`s. PiperOrigin-RevId: 821796530	2025-10-20 14:12:48 -07:00
Sohaib Iftikhar	cbeeef926f	[XLA:GPU]: Remove unused method from all reduce test PiperOrigin-RevId: 821742010	2025-10-20 12:28:17 -07:00
Hyeontaek Lim	cc9fd2b254	[IFRT Proxy] `Array::pjrt_layout()` uses `nullptr` to indicate a default layout IFRT Proxy now returns a `nullptr` if it knows that the Array layout represents a default layout. The user code previously has been migrated to handle this new behavior gracefully, obtaining a concrete default layout as before. Caveat: IFRT Proxy client infers the layout of the output arrays from `LoadedExecutable::GetOutputLayouts()`, which always concrete layouts today. Thus, these output arrays would use concrete layouts for default layouts, even if the arrays on the server side use `nullptr` for default layouts. This behavior is currently acceptable where all users convert the layout into a concrete one before using it, while this behavior will eventually change so that IFRT Proxy client reflects the array layouts on the server side more accurately. PiperOrigin-RevId: 821741105	2025-10-20 12:19:53 -07:00
Eugene Zhulenev	0e09f486e7	[xla:pjrt:ffi] Remove deprecated TypeID registration function PiperOrigin-RevId: 821740142	2025-10-20 12:11:20 -07:00
Michael Kuperstein	b824d4e187	[XLA] Remove verify_unique_channel_ids verifier option. The functionality has been removed previously, but the option was never cleaned up. This does not remove the xla_ignore_channel_id debug option because it also has a non-verifier use. PiperOrigin-RevId: 821737613	2025-10-20 11:59:45 -07:00
Kanish Anand	16e1567819	Clarify field name PiperOrigin-RevId: 821732575	2025-10-20 11:50:54 -07:00
Zac Mustin	903ce4c6ff	Use `GetPjRtCpuClient` directly when making a PJRT C API CPU client. Right now, we use `GetXlaPjrtCpuClient` which in turn calls `GetPjRtCpuClient`, but we will later update `GetXlaPjrtCpuClient` to use the C sandwich, in which case we must call `GetPjRtCpuClient` here in `PJRT_Client_Create`. This change is a no-op. PiperOrigin-RevId: 821732030	2025-10-20 11:44:41 -07:00
A. Unique TensorFlower	458995b35d	Set DNN version in DeviceDescription for autotuner cache. The dnn_version in device_description was not set, cl/816579045 fixed it for old autotuner infra, this change ports that change to the new autotuner infra. PiperOrigin-RevId: 821728904	2025-10-20 11:34:10 -07:00
A. Unique TensorFlower	027a15b3dc	[Autotuner] Early exit if there is only one supported config. - We encounter this case very often (for cublas autotuner), so it makes sense to optimize it. - Running cuBLAS kernels as part of autotuning has some unintended side effect which changes the optimized HLO, this fix also mitigates the issue, while we look more into it. PiperOrigin-RevId: 821716593	2025-10-20 11:08:05 -07:00
Emilio Cota	591ba9b4c2	Update symbols to unbreak Windows PiperOrigin-RevId: 821712287	2025-10-20 10:56:56 -07:00
Eugene Zhulenev	fd948cba88	[xla:cpu] Migrate tf2xla to BufferAllocationInfo Reverts `f2ed04aff6` PiperOrigin-RevId: 821660240	2025-10-20 08:41:06 -07:00
Quentin Khan	3a6eef2333	Add a helper to compute `*Low/High` `DWORD` parameter couples from a 64 bit int. Example: `MapViewOfFile` takes a couple of parameters `dwFileOffset(High\|Low)` when specifying the file offset. PiperOrigin-RevId: 821635630	2025-10-20 07:32:42 -07:00
A. Unique TensorFlower	f32acff204	Automated Code Change PiperOrigin-RevId: 821628227	2025-10-20 07:15:56 -07:00
A. Unique TensorFlower	789f9c8c92	Integrate LLVM at llvm/llvm-project@d5ce81dc81 Updates LLVM usage to match [d5ce81dc8143](https://github.com/llvm/llvm-project/commit/d5ce81dc8143) PiperOrigin-RevId: 821627987	2025-10-20 06:57:20 -07:00
Will Froom	4d5ed207cd	[XLA:GPU][XTile] Handle scalar load/store in xtile lowering. PiperOrigin-RevId: 821617202	2025-10-20 06:36:16 -07:00
Alex	8245a623aa	PR #32782 : [ROCm] Fix hermetic build for rocm Imported from GitHub PR https://github.com/openxla/xla/pull/32782 📝 Summary of Changes Fix hermetic build for rocm. 🎯 Justification Introduce missing hipblaslt dependency. Fix invalid libs linking and align with the data directories. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) CI, not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- f5cb68b0df2265b7048d0068eedd07cccf67e228 by Alexandros Theodoridis <atheodor@amd.com>: Add missing hermetic lib dependency -- fe0c9a7fdd36180fea5cf63e20d864355ed98a6c by Alexandros Theodoridis <atheodor@amd.com>: Add missing hipblaslt deps, fix the targets -- 540d79dd4287a013a3f178ef34a5b96fb8a8a92f by Alexandros Theodoridis <atheodor@amd.com>: Make hipblaslt mandatory -- 3a6f2282669a1ece4518cc69a01ad76275b603a1 by Alexandros Theodoridis <atheodor@amd.com>: Fix test -- eb21b60d34978191315a0c9775d2cb53309dc72d by Alexandros Theodoridis <atheodor@amd.com>: Ignore asnsigaltstack -- 54c8af2abd7dd682a8494caa05854d574209aa20 by Harsha Havanur Shamsundara <harsha.havanurshamsundara@amd.com>: [ROCm] Use working sha256 for latest ROCm 7.0 docker image -- 9629a9fc9201a80dba7a0beecb8ee0797960ff6f by Harsha HS <Harsha.HavanurShamsundara@amd.com>: [ROCm] Add ROCM_PATH repo_env to test scripts -- 1ef6772c6df6aeffcbcc2f27a0ede558fbc6270f by Alexandros Theodoridis <atheodor@amd.com>: Fix buildifier warning Merging this change closes #32782 PiperOrigin-RevId: 821614030	2025-10-20 06:26:00 -07:00
Kostiantyn Liepieshov	5866a4f621	fix relayout propagation for MPMD. In the cases where the program argument with AUTO layout is used in more than one Fragment enforce the DEFAULT layout as we cannot allow different compiled layouts PiperOrigin-RevId: 821612799	2025-10-20 06:11:10 -07:00
A. Unique TensorFlower	5707a02d98	[XLA:GPU] Enable `chlo.asinh` -> `kAsinh` `HloInstruction` lowering. PiperOrigin-RevId: 821610794	2025-10-20 05:57:12 -07:00
Adrian Kuegel	7144ba7d80	Migrate ListScheduler from TuplePointsToAnalysis to HloAliasAnalysis. Reverts `22032a9edb` PiperOrigin-RevId: 821605784	2025-10-20 05:44:20 -07:00
Will Froom	beb48d90e2	[XLA][XTile] Add xtile lowering passes for triton. This enables migrating the triton emitter to use emit xtile entry, insert & extract in the child PR. The main difference is the memref args in the entry function for which `MemrefToPtr` & `PtrToMemref` were introduced which closely resemble `UnrealizedConversionCastOp` with additional verification and will enable special folding of `memref::TransposeOp`. PiperOrigin-RevId: 821593545	2025-10-20 04:57:30 -07:00
Christian Sigg	ea72bd7e48	[XLA:GPU] Allow kPad in nest_gemm_fusion. PiperOrigin-RevId: 821581567	2025-10-20 04:16:25 -07:00
Nikita Putikhin	c842d810f4	[XLA:GPU] Initialize `global_split_limits` with the first config's `split_k`. PiperOrigin-RevId: 821579201	2025-10-20 04:03:13 -07:00
Quentin Khan	d8d0167ebc	Fix format specifiers in log. PiperOrigin-RevId: 821550544	2025-10-20 02:48:58 -07:00
A. Unique TensorFlower	5d49a4a177	compat: Update forward compatibility horizon to 2025-10-20 PiperOrigin-RevId: 821550460	2025-10-20 02:28:51 -07:00
A. Unique TensorFlower	990709c103	Update GraphDef version to 2386. PiperOrigin-RevId: 821550446	2025-10-20 02:14:20 -07:00
A. Unique TensorFlower	ec21abb4c3	Automated Code Change PiperOrigin-RevId: 821547310	2025-10-20 02:00:35 -07:00
A. Unique TensorFlower	5525a3f53e	Automated Code Change PiperOrigin-RevId: 821486960	2025-10-19 22:13:03 -07:00
A. Unique TensorFlower	c37b02fbbf	Automated Code Change PiperOrigin-RevId: 821479363	2025-10-19 21:43:38 -07:00
Parker Schuh	f322c0c82d	Refactor the event loop + socket integration so that it is separately testable. This gives us the two HalfClose events + HandleEvent() and SendRawFrame() as the API from the socket integration and subclasses can handle these accordingly. This also moves the responsibility to destroy in the handler logic with the contract that the event is removed from the loop on the second HalfClose event. PiperOrigin-RevId: 821445213	2025-10-19 19:02:26 -07:00
A. Unique TensorFlower	bce0886484	Automated Code Change PiperOrigin-RevId: 821380400	2025-10-19 13:08:38 -07:00
A. Unique TensorFlower	50808df60c	Add random perturbations to the xla_tpu_msa_sort_order_overrides flag Given a user seed, will update the MSA sort order priority of a (small?) number of randomly selected instructions during compilation. This causes small perturbations on the compiler's prefetching decisions, which allows for 2 main features: 1. finding out if there is a single instruction which was given a "wrong" priority by the compiler so it can be fixed - to do this, we run some benchmark many times with different seeds until we find a seed that drastically reduces the compiled code's runtime - once we found that seed, we can use binary search to decrease the "selection range" and zero-in on the one specific offending instruction 2. finding a lot of small changes that together reduce the runtime - we can do this using a "hill-climbing" method - try many perturbations until you find one slightly better than the baseline. - try many followup perturbations (perturbing the best perturbation from the previous stage) until you find one slightly better again - repeat until no more improvements are found NOTE: Right now there's not "good way" of finding which instructions had their priority adjusted (especially important in (1) to find the one offending instruction). The only way to do so is to increase the log-level of the compilation debug print and then look at the logs. PiperOrigin-RevId: 821309046	2025-10-19 06:33:49 -07:00
A. Unique TensorFlower	08cc6ff10c	Update GraphDef version to 2385. PiperOrigin-RevId: 821260487	2025-10-19 02:23:25 -07:00
A. Unique TensorFlower	3b2a32ab5f	compat: Update forward compatibility horizon to 2025-10-19 PiperOrigin-RevId: 821260464	2025-10-19 02:10:26 -07:00
David Majnemer	8cf42017ec	[TSL] Consolidate NUMA code across different platforms. No functional change is intended. PiperOrigin-RevId: 821216963	2025-10-18 22:14:29 -07:00
Subhankar Shah	a139a50e56	[XLA:MSA] Allow allocation requests with a continuous default memory requirement to fall through without resulting in a failure requiring un-commit. PiperOrigin-RevId: 821165250	2025-10-18 18:00:21 -07:00
A. Unique TensorFlower	ac5fb8fb7f	Introduce and connect an XLA:TPU shardy option to fully deduplicate functions in Shardy. This change is a no-op since both newly introduced XLA:TPU option and the corresponding option on ExportNamedComputation pass is false by default. PiperOrigin-RevId: 821039969	2025-10-18 05:48:48 -07:00
Aliia Khasanova	4985a1c2f3	Add proto [de]serialization for HostExecuteDoneThunk PiperOrigin-RevId: 821029998	2025-10-18 04:55:53 -07:00
A. Unique TensorFlower	083e682264	compat: Update forward compatibility horizon to 2025-10-18 PiperOrigin-RevId: 820999344	2025-10-18 02:21:41 -07:00
A. Unique TensorFlower	66f2c7c62d	Update GraphDef version to 2384. PiperOrigin-RevId: 820999336	2025-10-18 02:11:39 -07:00
A. Unique TensorFlower	17117898f9	Automated Code Change PiperOrigin-RevId: 820964120	2025-10-17 23:15:56 -07:00
A. Unique TensorFlower	34d6417ddf	Automated Code Change PiperOrigin-RevId: 820944060	2025-10-17 21:36:01 -07:00
Maxim Ermilov	4a42fca868	First step to introduce GpuComputeCapability custom class instead of std::variant PiperOrigin-RevId: 820940828	2025-10-17 21:24:11 -07:00
David Majnemer	4d358b2bac	[TSL] Remove unused `integral_types.h` from TSL platform defaults. This header provided typedefs for integral types within the `tsl` namespace, but these are no longer necessary. PiperOrigin-RevId: 820935573	2025-10-17 21:01:34 -07:00
A. Unique TensorFlower	4beacf5a04	Make the chain specifically target the reshape-transpose chain where the transposes are not identity permutations. Identity transposes should be eliminated separately in HandleTranspose already. PiperOrigin-RevId: 820903953	2025-10-17 18:56:47 -07:00
Alexander Shaposhnikov	ce65a0ad5c	[XLA:CPU] Add initial bits for YNNPACK support. + Do not build XLA with YNNPACK on Windows. Co-authored-by: Penporn Koanantakool <penporn@google.com> PiperOrigin-RevId: 820896434	2025-10-17 18:36:53 -07:00
A. Unique TensorFlower	f0057ee4b7	[XLA] Delete stale comment on ShapeUtil::PermuteDimensions usage. PiperOrigin-RevId: 820874660	2025-10-17 16:58:56 -07:00

1 2 3 4 5 ...

186058 Commits