tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

Author	SHA1	Message	Date
Will Froom	2ef85038ed	[XLA:CPU] Measure process CPU time in reduction benchmark. PiperOrigin-RevId: 825102754	2025-10-28 10:44:13 -07:00
Henning Becker	c7dd4775b7	Add missing dependency to TSL target Without that the layering check was failing. PiperOrigin-RevId: 825098439	2025-10-28 10:30:46 -07:00
Ilya Tikhonovskiy	14db6f6317	[XLA:GPU] follow up fix after pr#32919 Change `gpu_version` parameter to const reference in `IntelGpuCompiler`. This aligns the parameter type in `OptimizeHloConvolutionCanonicalization` with the base class signature. PiperOrigin-RevId: 825083863	2025-10-28 10:12:54 -07:00
Eugene Zhulenev	d75ad2c4ff	[xla:cpu] Cleanup KernelSpec to use absl::string_view and absl::Span PiperOrigin-RevId: 825075908	2025-10-28 09:56:20 -07:00
A. Unique TensorFlower	10fd9cfebb	Iterate on the functions of the map in a deterministic order to create function names deterministically. When a function has multiple instances with different manual axes, and dedup-functions-fully is on, it will have different copies of the same function. For example: sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) { sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {} } sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) { sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {} } sdy.named_computation<"foo">(%arg0) (%arg1: tensor<8xf32>) {} -----> sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) { call @foo(%arg1) } sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) { call @foo_0(%arg1) } call @foo_1(%arg0) The order of the iteration on the map/vector determines the which 'foo' will become 'foo_0', 'foo_1', or stay as 'foo'. PiperOrigin-RevId: 825074314	2025-10-28 09:49:24 -07:00
Dimitris Vardoulakis	0ad542ea89	PR #33117 : Rename "forward compatible" capabilities to "family compatible", per NVIDIA naming. Imported from GitHub PR https://github.com/openxla/xla/pull/33117 See: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/ Family compatible supported was introduced in `16b9d957ff`. It's not clear to me how someone can actually configure XLA to use sm_100f. Will look into that next. Copybara import of the project: -- 331d40c9c93ffb3a5c97e53e4017f604aa23d221 by Dimitris Vardoulakis <dvardoulakis@nvidia.com>: Rename "forward compatible" capabilities to "family compatible", per NVIDIA naming. See: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/ Merging this change closes #33117 PiperOrigin-RevId: 825073858	2025-10-28 09:38:56 -07:00
Niklas Vangerow	4aabddab2d	Migrate conv_depthwise_test to use PjRt. PiperOrigin-RevId: 825064898	2025-10-28 09:32:28 -07:00
Jian Cai	7c6d13443d	[XLA] Add a member function to check if a tuple tree has any tuples The function returns true if a tuple has only a root node. PiperOrigin-RevId: 825062842	2025-10-28 09:22:15 -07:00
Shaogang Wang	d1ca03b626	PR #33149 : [XLA:GPU] add VLOG dump option that only prints out primary command buffer graph. Imported from GitHub PR https://github.com/openxla/xla/pull/33149 📝 Summary of Changes add cuda graph dump option that only prints out primary graph, so not to flush the screen log with nested cuda graph. 🎯 Justification Easy debug read 🚀 Kind of Contribution Please remove what does not apply📚 Documentation Copybara import of the project: -- 18d6939170fd5bf4fa9228d4f74ca3ff4e83ec17 by Shawn Wang <shawnw@nvidia.com>: add cuda graph dump option that only prints out primary graph Merging this change closes #33149 PiperOrigin-RevId: 825049186	2025-10-28 09:03:40 -07:00
A. Unique TensorFlower	da8b9bf004	[Autotuner] Add (de)serialization from/to string in cache. - This is required to port sharding from gemm_fusion_autotuner. PiperOrigin-RevId: 825045654	2025-10-28 08:35:56 -07:00
Ilya Tikhonovskiy	ffca28bcf8	[XLA:GPU] Ignore reductions over dimensions of size 1 in UnstableReductionDetect The UnstableReductionDetector now considers reductions where all reduced dimensions have a size of 1 to be stable, as these operations are effectively no-ops and do not introduce numerical instability. A test case is added to verify this behavior. PiperOrigin-RevId: 825045042	2025-10-28 08:27:58 -07:00
Marcin Radomski	7334d07917	[XLA:GPU] Add check_thunk_result_consistency tool for verifying checksum consistency When implementing this it turned out that the log is currently missing some information needed to reliably distinguish input/output checksums and different thunk executions. This adds the needed fields to the proto, but emitting them in the log will be a separate change. With the extra data missing, the tool assumes all checksums refer to outputs, and each thunk execution is going to give the same results each time. The tests include the extra data, so once that's implement it should(TM) just work. PiperOrigin-RevId: 825040798	2025-10-28 08:20:28 -07:00
Sohaib Iftikhar	542ffe0410	[XLA:GPU]: Add peer to peer copies for cupti tracing. Before this change peer to peer copies done using cuMemcpyPeerAsync was not being tracked with the driver API. PiperOrigin-RevId: 825040246	2025-10-28 08:08:56 -07:00
Benjamin Chetioui	034d750525	[XLA][NFC] Remove line saying that `cudaGetLastError` is incompatible with command buffers. It turns out it is (although the broader point still stands). PiperOrigin-RevId: 825036231	2025-10-28 07:57:07 -07:00
Will Froom	de7a63363c	[XLA:CPU][XTile] Implement vectorized reduce. PiperOrigin-RevId: 825027697	2025-10-28 07:29:37 -07:00
Quentin Khan	11c00ca2db	If `--use_xnnpack` is specified, never use the default delegate in `benchmark_tflite_model`. Without this change, when using `--use_xnnpack`, either: 1. `--use_xnnpack=true`: the default resolver (that automatically applies an XNNPack delegate) is used and an XNNPack delegate that follows the options that are given on the command line is explicitly applied. 2. `--use_xnnpack=false`: the resolver without the default XNNPack delegate is used and no delegate is explicitly applied, i.e. no delegate is applied. 3. No `--use_xnnpack` is specified: the default resolver (that automatically applies an XNNPack delegate) is used. Case 1 has issues because the custom and default delegates are applied and these may interfere during the initialization. - Depending on the XNNPack options some operations may be delegated or not. - This leads to one or the other delegate to take the ops. - This makes the benchmarking of initialization completely wrong since two delegates are applied. - This messes up with the XNNPack weight cache since it can never be enabled for the default delegate. To solve this, the new behaviour is: 1. `--use_xnnpack=true`: the resolver without the default XNNPack delegate is used and an XNNPack delegate that follows the options that are given on the command line is explicitly applied. 2. `--use_xnnpack=false`: the resolver without the default XNNPack delegate is used and no delegate is explicitly applied, i.e. no delegate is applied. 3. No `--use_xnnpack` is specified: the default resolver (that automatically applies an XNNPack delegate) is used. Cases 2 and 3 are not affected by this change. PiperOrigin-RevId: 825018995	2025-10-28 07:09:35 -07:00
dependabot[bot]	97f4e08c24	PR #33140 : Bump actions/upload-artifact from 4.6.1 to 5.0.0 Imported from GitHub PR https://github.com/openxla/xla/pull/33140 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.1 to 5.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <p><strong>BREAKING CHANGE:</strong> this update supports Node <code>v24.x</code>. This is not a breaking change per-se but we're treating it as such.</p> <ul> <li>Update README.md by <a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li> <li>Readme: spell out the first use of GHES by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li> <li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li> <li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li> <li>Prepare <code>v5.0.0</code> by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/734">actions/upload-artifact#734</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li> <li><a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li> <li><a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v5.0.0">https://github.com/actions/upload-artifact/compare/v4...v5.0.0</a></p> <h2>v4.6.2</h2> <h2>What's Changed</h2> <ul> <li>Update to use artifact 2.3.2 package & prepare for new upload-artifact release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v4.6.2">https://github.com/actions/upload-artifact/compare/v4...v4.6.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`330a01c490`"><code>330a01c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/734">#734</a> from actions/danwkennedy/prepare-5.0.0</li> <li><a href="`03f2824452`"><code>03f2824</code></a> Update <code>github.dep.yml</code></li> <li><a href="`905a1ecb59`"><code>905a1ec</code></a> Prepare <code>v5.0.0</code></li> <li><a href="`2d9f9cdfa9`"><code>2d9f9cd</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/725">#725</a> from patrikpolyak/patch-1</li> <li><a href="`9687587dec`"><code>9687587</code></a> Merge branch 'main' into patch-1</li> <li><a href="`2848b2cda0`"><code>2848b2c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/727">#727</a> from danwkennedy/patch-1</li> <li><a href="`9b511775fd`"><code>9b51177</code></a> Spell out the first use of GHES</li> <li><a href="`cd231ca1ed`"><code>cd231ca</code></a> Update GHES guidance to include reference to Node 20 version</li> <li><a href="`de65e23aa2`"><code>de65e23</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/712">#712</a> from actions/nebuk89-patch-1</li> <li><a href="`8747d8cd76`"><code>8747d8c</code></a> Update README.md</li> <li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=4.6.1&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Copybara import of the project: -- 5eab24c4d57708cbb45b476265bca2e841706647 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>: Bump actions/upload-artifact from 4.6.1 to 5.0.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.1 to 5.0.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4.6.1...330a01c490aca151604b8cf639adc76d48f6c5d4) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: 5.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Merging this change closes #33140 PiperOrigin-RevId: 824999008	2025-10-28 06:14:44 -07:00
Adrian Kuegel	a12d2cfb31	[XLA:GPU] Make ReductionEmitter deterministic. So far, the output could be non-deterministic if multiple reductions are grouped together. This change makes it deterministic. PiperOrigin-RevId: 824965037	2025-10-28 04:32:11 -07:00
Adrian Kuegel	29bc205be3	Update the Shardy pin in XLA This should resolve shardy related test failures. PiperOrigin-RevId: 824944776	2025-10-28 03:26:51 -07:00
A. Unique TensorFlower	4d3b9fb509	Update GraphDef version to 2394. PiperOrigin-RevId: 824918460	2025-10-28 02:24:51 -07:00
A. Unique TensorFlower	9bd524b777	compat: Update forward compatibility horizon to 2025-10-28 PiperOrigin-RevId: 824918400	2025-10-28 02:15:51 -07:00
A. Unique TensorFlower	e937dcc97c	Automated Code Change PiperOrigin-RevId: 824904755	2025-10-28 01:35:51 -07:00
A. Unique TensorFlower	c45fcf1b1c	Automated Code Change PiperOrigin-RevId: 824851950	2025-10-27 23:03:01 -07:00
A. Unique TensorFlower	c6d737900f	Automated Code Change PiperOrigin-RevId: 824847154	2025-10-27 22:48:55 -07:00
Zixuan Jiang	85d834e07b	Add `PrintArray` and `ArrayToString` methods to `IotaTileAssignment` and `TileAssignment`. These new methods allow printing or converting to a string only the array representation of the tile assignment, without including the tile dimensions. The existing `Print` and `ToString` methods are updated to use these new array-specific printing functions. PiperOrigin-RevId: 824816702	2025-10-27 21:19:25 -07:00
Zixuan Jiang	402ead44b2	The compatible factor shardings should not have overlap between axes across different tensors. PiperOrigin-RevId: 824815687	2025-10-27 21:09:47 -07:00
Eugene Zhulenev	c09d68c588	[xla:ffi] Remove unused context decoding for C API internals PiperOrigin-RevId: 824792000	2025-10-27 19:59:55 -07:00
Michael Kuperstein	b6f66e3e01	[XLA] VLOG instruction count before each HLO pass. PiperOrigin-RevId: 824791349	2025-10-27 19:50:20 -07:00
A. Unique TensorFlower	4231383b5b	Integrate LLVM at llvm/llvm-project@d0a7411cb8 Updates LLVM usage to match [d0a7411cb840](https://github.com/llvm/llvm-project/commit/d0a7411cb840) PiperOrigin-RevId: 824767534	2025-10-27 18:28:40 -07:00
A. Unique TensorFlower	6e82d4d96b	Add call to `ynn_optimize_subgraph` This currently happens implicitly in `ynn_create_runtime`, but that will not be the case soon. (Calling it multiple times is harmless.) PiperOrigin-RevId: 824754921	2025-10-27 17:53:49 -07:00
Parker Schuh	1fc47fae8e	Transition to an error state more aggressively when the socket reports errors. PiperOrigin-RevId: 824749937	2025-10-27 17:40:30 -07:00
A. Unique TensorFlower	582aa05a79	Add TensorTypeGetSize to schema_utils. PiperOrigin-RevId: 824743105	2025-10-27 17:30:05 -07:00
Maxim Ermilov	699879f5f3	add pcie_bandwidth field to DeviceDescription PiperOrigin-RevId: 824738638	2025-10-27 17:17:00 -07:00
Eugene Zhulenev	769acdd784	[xla] Migrate Tensorflow and XLA to xla::Future Cleanup BUILD files and fix header includes in preparation for pjrt_future removal. PiperOrigin-RevId: 824733023	2025-10-27 17:06:39 -07:00
A. Unique TensorFlower	633e1931cd	Add `size_t` casts to `memcpy` size calculation in `BroadcastTo`. Explicitly cast the operands of the size calculation to `size_t` to prevent potential integer overflow before calling `memcpy` under 64bit system. PiperOrigin-RevId: 824732102	2025-10-27 16:54:11 -07:00
Tori Baker	d299463d26	[xla:gpu] Fix our convert integer to pred in our Triton emitter `arith.trunci` for i1 will simply take the last bit, but HLO expects convert to i1 to be value != 0. Emit this conversion a a compare not equal to 0 instead. This is already done correctly for floats. PiperOrigin-RevId: 824716165	2025-10-27 16:14:51 -07:00
Matt Hurd	ffc21f066a	Add session_id to profiler_options This will allow a follow-up PR that allows utilizing this proto. PiperOrigin-RevId: 824709715	2025-10-27 15:56:14 -07:00
Antonio Sanchez	4cb35bb197	Update unused visibility rules. PiperOrigin-RevId: 824698597	2025-10-27 15:24:11 -07:00
Majid Dadashi	263abaee7f	Allow tflite interpreter to receive i4 tensor The data needs to be placed in i8 numpy container and it's automatically packed into i4 tensors. PiperOrigin-RevId: 824685124	2025-10-27 14:53:06 -07:00
A. Unique TensorFlower	b8ba187ff8	Refactor call_library_for_dot -> library_supports_dot This enables BF16 to be sent to YNNPACK without casting to F32. PiperOrigin-RevId: 824667930	2025-10-27 14:11:45 -07:00
A. Unique TensorFlower	e79d4ebeec	Update XNNPACK in XLA - New windows support - Workaround for Intel AMX in GCC PiperOrigin-RevId: 824644581	2025-10-27 13:13:02 -07:00
Maxim Ermilov	6e8976e3cf	Remove no longer needed one liner methods PiperOrigin-RevId: 824631750	2025-10-27 12:50:39 -07:00
Niklas Vangerow	56a660da4b	Migrate cpu_gpu_fusion_test to use PjRt. PiperOrigin-RevId: 824627421	2025-10-27 12:39:34 -07:00
A. Unique TensorFlower	a1219cfa94	Integrate LLVM at llvm/llvm-project@d7e40f3e71 Updates LLVM usage to match [d7e40f3e7165](https://github.com/llvm/llvm-project/commit/d7e40f3e7165) PiperOrigin-RevId: 824622381	2025-10-27 12:26:58 -07:00
A. Unique TensorFlower	428f0df91b	Add a `reserve` in sampler.cc. PiperOrigin-RevId: 824620123	2025-10-27 12:14:17 -07:00
Mohammed Anany	6d8ae3a9d7	Add warp specialization to Triton autotuning. This change introduces `is_warp_specialization_allowed` to `TritonGemmConfig` and `BlockLevelFusionConfig`. The autotuner now explores configurations with warp specialization enabled, but only on Blackwell+ devices and when TMA is also enabled. The fusion emitter uses this new parameter to set the `tt.warp_specialize` attribute. PiperOrigin-RevId: 824601781	2025-10-27 11:30:32 -07:00
Niklas Vangerow	d7c638ad39	Migrate gpu transforms reduction_layout_normalizer_test to use PjRt. PiperOrigin-RevId: 824573461	2025-10-27 10:43:38 -07:00
Will Froom	17b1932f13	[XLA:GPU][XTile] Don't cast to intermediate i64 when extracting the program id. PiperOrigin-RevId: 824573427	2025-10-27 10:25:39 -07:00
Niklas Vangerow	7257c66fae	Migrate pad_test to use PjRt. PiperOrigin-RevId: 824545233	2025-10-27 09:21:20 -07:00
A. Unique TensorFlower	38ebd16aed	Remove IndexingMap::GetMutableAffineMap() This change removes the GetMutableAffineMap() method from xla::IndexingMap. The mutable access to the underlying mlir::AffineMap can't be used because we will use a different internal implementation (SymbolicMap). I also think it's cleaner to not provide this method. PiperOrigin-RevId: 824536996	2025-10-27 09:05:08 -07:00

... 3 4 5 6 7 ...

186511 Commits