It is pure mechanical move cl.
The goal is to use the pass for all the buffer debug checks. We have checksum and nan_counter kernels at the moment.
PiperOrigin-RevId: 825602375
Imported from GitHub PR https://github.com/openxla/xla/pull/33085📝 Summary of Changes
Fix too strict spawn strategy for rbe builds
🎯 Justification
remote only execution is not possible for all the tests
🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix
📊 Benchmark (for Performance Improvements)
Not relevant
🧪 Unit Tests:
Not relevant
🧪 Execution Tests:
Not relevant
Copybara import of the project:
--
df73e6e006c47d5ada1e14ced8f2ae94c0df7dd8 by Alexandros Theodoridis <atheodor@amd.com>:
Fix too strict default spanw strategy for rbe builds
Merging this change closes#33085
PiperOrigin-RevId: 825463234
Encoding extra metadata about an debug log entry within its ID limits how much
information we can pass. To remove the limitation without the need to pass
extra data between host and device, introduce a metadata store that provides a
opaque ID -> metadata mapping.
Follow up patches will make checksum/NaN tracing use
BufferDebugLogEntryMetadataStore shared between all thunks that operate on
BufferDebugLog:
- BuffersChecksumThunks put the metadata into the store and use the returned
entry_ids to identify the checksums from BufferDebugLog,
- xla_gpu_buffer_debug_log_dump reads the BufferDebugLog and uses the store to
resolve the entry_ids into the metadata.
PiperOrigin-RevId: 825462635
Imported from GitHub PR https://github.com/openxla/xla/pull/33141
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/releases">github/codeql-action's releases</a>.</em></p>
<blockquote>
<h2>v4.31.0</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<p>See the full <a href="https://github.com/github/codeql-action/blob/v4.31.0/CHANGELOG.md">CHANGELOG.md</a> for more information.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's changelog</a>.</em></p>
<blockquote>
<h1>CodeQL Action Changelog</h1>
<p>See the <a href="https://github.com/github/codeql-action/releases">releases page</a> for the relevant changes to the CodeQL CLI and language packs.</p>
<h2>[UNRELEASED]</h2>
<p>No user facing changes.</p>
<h2>4.31.0 - 24 Oct 2025</h2>
<ul>
<li>Bump minimum CodeQL bundle version to 2.17.6. <a href="https://redirect.github.com/github/codeql-action/pull/3223">#3223</a></li>
<li>When SARIF files are uploaded by the <code>analyze</code> or <code>upload-sarif</code> actions, the CodeQL Action automatically performs post-processing steps to prepare the data for the upload. Previously, these post-processing steps were only performed before an upload took place. We are now changing this so that the post-processing steps will always be performed, even when the SARIF files are not uploaded. This does not change anything for the <code>upload-sarif</code> action. For <code>analyze</code>, this may affect Advanced Setup for CodeQL users who specify a value other than <code>always</code> for the <code>upload</code> input. <a href="https://redirect.github.com/github/codeql-action/pull/3222">#3222</a></li>
</ul>
<h2>4.30.9 - 17 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.3. <a href="https://redirect.github.com/github/codeql-action/pull/3205">#3205</a></li>
<li>Experimental: A new <code>setup-codeql</code> action has been added which is similar to <code>init</code>, except it only installs the CodeQL CLI and does not initialize a database. Do not use this in production as it is part of an internal experiment and subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3204">#3204</a></li>
</ul>
<h2>4.30.8 - 10 Oct 2025</h2>
<p>No user facing changes.</p>
<h2>4.30.7 - 06 Oct 2025</h2>
<ul>
<li>[v4+ only] The CodeQL Action now runs on Node.js v24. <a href="https://redirect.github.com/github/codeql-action/pull/3169">#3169</a></li>
</ul>
<h2>3.30.6 - 02 Oct 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.2. <a href="https://redirect.github.com/github/codeql-action/pull/3168">#3168</a></li>
</ul>
<h2>3.30.5 - 26 Sep 2025</h2>
<ul>
<li>We fixed a bug that was introduced in <code>3.30.4</code> with <code>upload-sarif</code> which resulted in files without a <code>.sarif</code> extension not getting uploaded. <a href="https://redirect.github.com/github/codeql-action/pull/3160">#3160</a></li>
</ul>
<h2>3.30.4 - 25 Sep 2025</h2>
<ul>
<li>We have improved the CodeQL Action's ability to validate that the workflow it is used in does not use different versions of the CodeQL Action for different workflow steps. Mixing different versions of the CodeQL Action in the same workflow is unsupported and can lead to unpredictable results. A warning will now be emitted from the <code>codeql-action/init</code> step if different versions of the CodeQL Action are detected in the workflow file. Additionally, an error will now be thrown by the other CodeQL Action steps if they load a configuration file that was generated by a different version of the <code>codeql-action/init</code> step. <a href="https://redirect.github.com/github/codeql-action/pull/3099">#3099</a> and <a href="https://redirect.github.com/github/codeql-action/pull/3100">#3100</a></li>
<li>We added support for reducing the size of dependency caches for Java analyses, which will reduce cache usage and speed up workflows. This will be enabled automatically at a later time. <a href="https://redirect.github.com/github/codeql-action/pull/3107">#3107</a></li>
<li>You can now run the latest CodeQL nightly bundle by passing <code>tools: nightly</code> to the <code>init</code> action. In general, the nightly bundle is unstable and we only recommend running it when directed by GitHub staff. <a href="https://redirect.github.com/github/codeql-action/pull/3130">#3130</a></li>
<li>Update default CodeQL bundle version to 2.23.1. <a href="https://redirect.github.com/github/codeql-action/pull/3118">#3118</a></li>
</ul>
<h2>3.30.3 - 10 Sep 2025</h2>
<p>No user facing changes.</p>
<h2>3.30.2 - 09 Sep 2025</h2>
<ul>
<li>Fixed a bug which could cause language autodetection to fail. <a href="https://redirect.github.com/github/codeql-action/pull/3084">#3084</a></li>
<li>Experimental: The <code>quality-queries</code> input that was added in <code>3.29.2</code> as part of an internal experiment is now deprecated and will be removed in an upcoming version of the CodeQL Action. It has been superseded by a new <code>analysis-kinds</code> input, which is part of the same internal experiment. Do not use this in production as it is subject to change at any time. <a href="https://redirect.github.com/github/codeql-action/pull/3064">#3064</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="4e94bd11f7"><code>4e94bd1</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3235">#3235</a> from github/update-v4.31.0-1d36546c1</li>
<li><a href="8f11182164"><code>8f11182</code></a> Update changelog for v4.31.0</li>
<li><a href="1d36546c14"><code>1d36546</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3234">#3234</a> from github/mbg/changelog/post-processing</li>
<li><a href="08ada26e6a"><code>08ada26</code></a> Add changelog entry for post-processing change</li>
<li><a href="b843cbeed0"><code>b843cbe</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3233">#3233</a> from github/mbg/getOptionalEnvVar</li>
<li><a href="1ecd563919"><code>1ecd563</code></a> Use <code>getOptionalEnvVar</code> in <code>writePostProcessedFiles</code></li>
<li><a href="e576807920"><code>e576807</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3223">#3223</a> from github/henrymercer/bump-minimum</li>
<li><a href="ad35676669"><code>ad35676</code></a> Add <code>getOptionalEnvVar</code> function</li>
<li><a href="d75645b13f"><code>d75645b</code></a> Merge pull request <a href="https://redirect.github.com/github/codeql-action/issues/3222">#3222</a> from github/mbg/upload-lib/post-process</li>
<li><a href="710606cc35"><code>710606c</code></a> Check that <code>outputPath</code> is non-empty</li>
<li>Additional commits viewable in <a href="16140ae1a1...4e94bd11f7">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Copybara import of the project:
--
cbe7908eed34d441708d7360f23dad04e5b48ee1 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:
Bump github/codeql-action from 4.30.9 to 4.31.0
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.30.9 to 4.31.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](16140ae1a1...4e94bd11f7)
---
updated-dependencies:
- dependency-name: github/codeql-action
dependency-version: 4.31.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Merging this change closes#33141
PiperOrigin-RevId: 825449574
Imported from GitHub PR https://github.com/openxla/xla/pull/33205📝 Summary of Changes
The simplification was unintentionally disabled in
2accf052cb.
🎯 Justification
Bug fix.
🚀 Kind of Contribution
🐛 Bug Fix, ⚡️ Performance Improvement
📊 Benchmark (for Performance Improvements)
No
🧪 Unit Tests:
Yes.
🧪 Execution Tests:
No.
Copybara import of the project:
--
2fb682c10ff49212044dd995ba97aa329e52bb71 by Ilia Sergachev <isergachev@nvidia.com>:
[GPU] Fix reduce-precision simplification.
Was unintentionally disabled in
2accf052cb.
Merging this change closes#33205
PiperOrigin-RevId: 825449067
* Extend alternate memory chunk reservations for aliased uses.
* Add pinned allocations in alternate memory for aliased uses.
* Mark all aliased allocations as colocated.
Pin all values aliased to the left of the source hlo value being block prefetched to default memory, values aliased to the right of the source hlo value will be pinned to default memory in all cases except - when a scheduling a new prefetch is successful.
* For source values, if a pinned allocation exists and adding uses to the allocation is required.
* For other (aliased) values finalizing the value suffices.
Misc:
* When creating colocated aliased allocations in alternate memory, do not rely on getting the first colocated block from the back of the list.
* Code cleanup.
PiperOrigin-RevId: 825302368
This change removes the deprecated `Shape(const ShapeProto&)` constructor and updates its call sites to use `Shape::FromProto` instead, which returns a `StatusOr<Shape>`. The call sites now explicitly handle the potential error status.
PiperOrigin-RevId: 825288664
This prints the entire instruction instead of instruction name when the original value has a different tuple structure with the instruction shape in HLO verifier pass. The instruction name is not reliable and could be different from its name in the HLO dump.
PiperOrigin-RevId: 825241833
This changes the predicates for calling a library for a dot to indicate whether we will actually call the library, not just whether the library supports the dot. This fixes bugs where we incorrectly claim a library will handle the dot.
PiperOrigin-RevId: 825231317
This change introduces a `session_id` option in `ProfileOptions` and `RemoteProfilerSessionManagerOptions`. When provided, this id will be used as the subdirectory for storing profile data, instead of the default timestamp.
PiperOrigin-RevId: 825216123
Change `gpu_version` parameter to const reference in `IntelGpuCompiler`.
This aligns the parameter type in `OptimizeHloConvolutionCanonicalization` with the base class signature.
PiperOrigin-RevId: 825083863
When a function has multiple instances with different manual axes, and dedup-functions-fully is on, it will have different copies of the same function.
For example:
sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
sdy.named_computation<"foo">(%arg1) (%arg2: tensor<4xf32>) {}
}
sdy.named_computation<"foo">(%arg0) (%arg1: tensor<8xf32>) {}
----->
sdy.manual_computation(%arg0) manual_axes={"x"} (%arg1: tensor<4xf32>) {
call @foo(%arg1)
}
sdy.manual_computation(%arg0) manual_axes={"y"} (%arg1: tensor<4xf32>) {
call @foo_0(%arg1)
}
call @foo_1(%arg0)
The order of the iteration on the map/vector determines the which 'foo' will become 'foo_0', 'foo_1', or stay as 'foo'.
PiperOrigin-RevId: 825074314
Imported from GitHub PR https://github.com/openxla/xla/pull/33149📝 Summary of Changes
add cuda graph dump option that only prints out primary graph, so not to flush the screen log with nested cuda graph.
🎯 Justification
Easy debug read
🚀 Kind of Contribution
Please remove what does not apply📚 Documentation
Copybara import of the project:
--
18d6939170fd5bf4fa9228d4f74ca3ff4e83ec17 by Shawn Wang <shawnw@nvidia.com>:
add cuda graph dump option that only prints out primary graph
Merging this change closes#33149
PiperOrigin-RevId: 825049186
The UnstableReductionDetector now considers reductions where all reduced dimensions have a size of 1 to be stable, as these operations are effectively no-ops and do not introduce numerical instability. A test case is added to verify this behavior.
PiperOrigin-RevId: 825045042
When implementing this it turned out that the log is currently missing some information needed to reliably distinguish input/output checksums and different thunk executions. This adds the needed fields to the proto, but emitting them in the log will be a separate change.
With the extra data missing, the tool assumes all checksums refer to outputs, and each thunk execution is going to give the same results each time. The tests include the extra data, so once that's implement it should(TM) just work.
PiperOrigin-RevId: 825040798
So far, the output could be non-deterministic if multiple reductions are
grouped together. This change makes it deterministic.
PiperOrigin-RevId: 824965037
These new methods allow printing or converting to a string only the array representation of the tile assignment, without including the tile dimensions. The existing `Print` and `ToString` methods are updated to use these new array-specific printing functions.
PiperOrigin-RevId: 824816702
This currently happens implicitly in `ynn_create_runtime`, but that will not be the case soon. (Calling it multiple times is harmless.)
PiperOrigin-RevId: 824754921
`arith.trunci` for i1 will simply take the last bit, but HLO expects convert to i1 to be value != 0. Emit this conversion a a compare not equal to 0 instead. This is already done correctly for floats.
PiperOrigin-RevId: 824716165
This change introduces `is_warp_specialization_allowed` to `TritonGemmConfig` and `BlockLevelFusionConfig`. The autotuner now explores configurations with warp specialization enabled, but only on Blackwell+ devices and when TMA is also enabled. The fusion emitter uses this new parameter to set the `tt.warp_specialize` attribute.
PiperOrigin-RevId: 824601781
This change removes the GetMutableAffineMap() method from xla::IndexingMap. The mutable access to the underlying mlir::AffineMap can't be used because we will use a different internal implementation (SymbolicMap). I also think it's cleaner to not provide this method.
PiperOrigin-RevId: 824536996
We use this field for two different buffer debug kernels that have different semantic. Technically we could have two different structures but it does not makes much sense at the moment. Let's use the one that we already have with the generic name.
PiperOrigin-RevId: 824532743
Add an API to lookup type id and info by type name. We can't rely on type ids for serialization, as they are not stable and assigned at run time depending on the type registration order. Type names on the other hand must be stable.
PiperOrigin-RevId: 824512487
- Renamed and make public SymbolicToAffine to SymbolicExprToAffineExpr (needed for IndexingMap::GetConstraints)
- Renamed AffineToSymbolicExpr to AffineExprToSymbolicExpr
- Added AffineExprsToSymbolicExprs to convert a list of mlir::AffineExpr to a vector of xla::gpu::SymbolicExpr (needed for IndexingMap::ConstraintsSatisfied)
PiperOrigin-RevId: 824492246
In the follow up cl we will need to add this thunk to the buffer debug pass.
Also there we will need to infer the buffer element type.
Another refactoring would be to change the name of the payload which is the checksum at the moment to something more generic like 'value' or 'result'.
One more thing we could do is to reduce the code duplication by merging together both thunks, the checksum one and nan counter one.
PiperOrigin-RevId: 824491914
The `DotDecomposer` pass runs ahead of layout assignment. Introducing non-default layouts at this stage causes complications for subsequent passes, in particular the `DotMerger` pass.
PiperOrigin-RevId: 824476578
Imported from GitHub PR https://github.com/openxla/xla/pull/32439
…inprocess_lld
📝 Summary of Changes
Enable embedded device libs and in-process lld by default.
🎯 Justification
Moves amdgpu backend to be more filesystem layout independent.
🚀 Kind of Contribution
🐛 Bug Fix
📊 Benchmark (for Performance Improvements)
N\A
🧪 Unit Tests:
None
🧪 Execution Tests:
None
Copybara import of the project:
--
46a100377d00d30dbc79e34c977b9219c54bda4b by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>:
[ROCm] Fix and enable xla_gpu_use_embeded_device_lib and xla_gpu_use_inprocess_lld
Merging this change closes#32439
PiperOrigin-RevId: 824476138
absl::Hash is not deterministic over different runs of the same program. Use
Fingerprint128 instead, and don't include the address of the computation.
PiperOrigin-RevId: 824460524
Imported from GitHub PR https://github.com/openxla/xla/pull/31886📝 Summary of Changes
This enhances the search for the CUDA libdevice path:
- Fix an invalid empty path added when `TF_CUDA_TOOLKIT_PATH` which may be empty
- Fix invalid paths based on runtime folders: `runfiles_dir.substr(0, runfiles_ind + runfiles_suffix.length())` is not meaningful when `runfiles_ind` isn't valid, i.e. `std::string::npos`
- Add `$CUDA_HOME` to the search paths. This is also used in TensorFlow already
🎯 Justification
Without this the libdevice file won't be found if CUDA isn't installed in a standard location or e.g. an updated version is available in a different location.
This is the case for e.g. HPC systems where multiple CUDA versions are available side-by-side.
🚀 Kind of Contribution
🐛 Bug Fix, ♻️ Cleanup
Fixes#28590🧪 Unit Tests:
Simple test that when `CandidateCudaRoots` returns anything it contains `$CUDA_HOME`
Copybara import of the project:
--
01788b896900717ee916377a71d5c14963e0176d by Alexander Grund <alexander.grund@tu-dresden.de>:
Fix libdevice search when outside test environment
When there is no `runfiles_suffix` the `rfind` returns
`std::string::npos` which should be handled to not add meaningless paths.
--
900715a846102bacdfc7688f14713cbe6101506d by Alexander Grund <alexander.grund@tu-dresden.de>:
Use `$CUDA_HOME` when searching for libdevice.
With a CUDA installed to a non-default location XLA/TF fails with:
> gpu_backend_lib.cc:579] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
> Searched for CUDA in the following directories:
> ./cuda_sdk_lib
> /builddir/TensorFlow/TensorFlow-2.x_mnist-test.py.runfiles/cuda_nvcc
> /buildi/cuda_nvcc
>
> /usr/local/cuda
> /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
> /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
> /software/TensorFlow/lib/python3.12/site-packages/tensorflow/python/platform/../../cuda
Consider $CUDA_HOME as an additional location after the runfiles dirs (used for tests)
--
905d0596d199598036032f0f84b4487e9afd2bef by Alexander Grund <alexander.grund@tu-dresden.de>:
Don't add empty TF_CUDA_TOOLKIT_PATH to libdevice search
At least in some environments that define is the empty string which
doesn't make sense to add to the search paths.
Add a check for that.
--
23eb59bfabd570caabf0b9ec3515233f46a4fae7 by Alexander Grund <alexander.grund@tu-dresden.de>:
Add test for $CUDA_HOME in CandidateCudaRoots
--
a8c215bc222b4ba8581f2f44549613ebd59b9cbb by Alexander Grund <alexander.grund@tu-dresden.de>:
Add braces to loops/conditions
--
39efc67f8b1d44e131f993c8040b7eb69ff52f0c by Alexander Grund <alexander.grund@tu-dresden.de>:
Use kIsOpenSource in skip condition
Merging this change closes#31886
PiperOrigin-RevId: 824450284
This is moving `Scalar`, `Array`, `Dictionary`, `FlatAttribute`, `FlatAttributeMap`, and `AttributeMap` from `CallFrameBuilder` into the `xla::ffi` namespace.
It also moves the code into `attribute_map.{cc|h}`.
All these types are basically aliases from some kind of `std::variant` type. This change is a preparation for making them proper types and add `ToProto` and `FromProto` methods.
PiperOrigin-RevId: 824435281
Also fixed the round trip test to not ignore `kInvalid` returned from proto conversion, which is why we didn't catch this bug.
PiperOrigin-RevId: 824419619
The meaning of AsyncValue::IsUnique() is fuzzy for the chain of indirect async values. Prefer simpler check for uniqueness in Future/Promise library.
Also update AsyncValue::IsUnique() documentation.
PiperOrigin-RevId: 824256830
This change invalidates the autotune cache, which is necessary because enabling the generic emitter (cl/823475406) affected autotuning results.
PiperOrigin-RevId: 823818338
It is no-op behaviorally for shardy. Because the call output and func result may mismatch only if dedup-functions-fully options is true, and this option is false by default.
Shardy will add explicit reshards (during shardy partitioner) on those operations that use the output of named computation and it will do so assuming the sharding of the named computation is sharded as specified in the out shardings of the named computation.
When dedup-functions-fully option is true, however, the function that is actually called may end up having a different output sharding than the corresponding named computation. So, the users of the output shardings should still use sharding as in the output shardings the named computation. Hence, if there is a mismatch between the output sharding of the named computation and the result sharding of the function, we add a reshard on the output of the call.
PiperOrigin-RevId: 823494391
Explicitly set the operand precisions to `PrecisionConfig::DEFAULT` when creating a `ScaledDot` instruction from a composite call.
PiperOrigin-RevId: 823488638
+ use `ptr` when using `AsPtr()` for consistency
+ rename `Wrap` to `AndThen` as it's more meaningful and makes profiles readable
PiperOrigin-RevId: 823476695
According to benchmarks we have reached the neutrality with the legacy emitter. Switching to the new emitter by default. Legacy emitter will be kept for some time but is considered depricated and should not be used. It will be deleted in the near future.
Reverts 85c99b1ecb
PiperOrigin-RevId: 823475406
Previously, we would never allow simplification when encountering a `dot`
instruction. But this constraint was overly conservative; the only dimensions
that we shouldn't simplify are those along which we intend to perform
non-standard padding to fit to hardware restrictions, i.e. the non-contracting
and contracting dimensions.
Restricting this pattern further works around a bug whereby expanding a
non-standardly padded dimension into a `1` dim can result in propagating a
tile with the wrong size.
The underlying reason for this is a bug in the `kPreserve` behaviour of
`IndexingMap` simplification, which will need to be fixed separately (the new
tiling should avoid this issue, since it shouldn't rely on the correctness of
`IndexingMap` simplification at this level).
PiperOrigin-RevId: 823258725
Note that, in order to maintain parity with MHLO optimizations, this enables the `assume-no-undeclared-side-effects` option. This matches the default behavior for MHLO, but StableHLO is more cautious by default. Empirically, past evidence suggests it's pretty safe given that MHLO has been doing it all this time. Disabling the flag can result in significantly larger HLO after lowering, so we enable it here.
PiperOrigin-RevId: 823234079