Commit Graph

153857 Commits

Author SHA1 Message Date
A. Unique TensorFlower
a69eea2814 Internal changes only.
PiperOrigin-RevId: 565383371
2023-09-14 09:04:21 -07:00
Bixia Zheng
c6b4dd5ef2 [xla] Rename LatencyHidingSchedulerPreparation pass to P2PSchedulePreparation
pass.

This is because the pass is needed to linearize point-to-point Send and Recv
chains an HLO scheduler.

Modify the GPU HLO scheduler to call P2PSchedulePreparation pass regardless
whether the latency hiding scheduler is on.

PiperOrigin-RevId: 565374605
2023-09-14 08:39:30 -07:00
Pat Notz
ad428bcfb9 Fixes to avoid deadlocks with collectives in the pipelining while loop
PiperOrigin-RevId: 565355251
2023-09-14 07:04:02 -07:00
A. Unique TensorFlower
9d833cd42c Internal Code Change
PiperOrigin-RevId: 565353457
2023-09-14 06:55:39 -07:00
Benjamin Kramer
98b0549d68 Integrate LLVM at llvm/llvm-project@bf8fd086d0
Updates LLVM usage to match
[bf8fd086d09c](https://github.com/llvm/llvm-project/commit/bf8fd086d09c)

PiperOrigin-RevId: 565351116
2023-09-14 06:43:23 -07:00
Johannes Reifferscheid
b24592bcda Const-correctness fixes for GetFusionRoots.
The output is mutable for no good reason, which causes issues when we want
to express "this fusion instruction's roots or the instruction if it's not
a fusion".

PiperOrigin-RevId: 565341699
2023-09-14 05:48:24 -07:00
Jiyoun (Jen) Ha
cd3d3c25b4 Add StableHLO Quantizer as an option in TF Quantizer.
PiperOrigin-RevId: 565339452
2023-09-14 05:36:22 -07:00
Andrew Goodbody
3da3565572 [Linaro:ARM_CI] Update to clang-17
Update compiler in use to be clang-17 to
maintain sync with other builds.
2023-09-14 12:09:14 +01:00
A. Unique TensorFlower
e585ea7203 Added placeholder for internal RPC option.
PiperOrigin-RevId: 565319212
2023-09-14 03:38:16 -07:00
Ilia Sergachev
2736ef7a65 [XLA:GPU] Handle kTranspose in optimized HLO in Triton emitters.
PiperOrigin-RevId: 565307989
2023-09-14 02:44:21 -07:00
Alan Kelly
ac9aa167ff Select correct FC 8x16 path
PiperOrigin-RevId: 565302587
2023-09-14 02:23:29 -07:00
A. Unique TensorFlower
f17c9f5ef5 Update GraphDef version to 1619.
PiperOrigin-RevId: 565301847
2023-09-14 02:16:28 -07:00
A. Unique TensorFlower
f3b30f5a0b compat: Update forward compatibility horizon to 2023-09-14
PiperOrigin-RevId: 565301845
2023-09-14 02:08:52 -07:00
Zichuan Wei
ab7c193a3d lite: enable group conv -> conv2d conversion
PiperOrigin-RevId: 565268295
2023-09-13 23:28:05 -07:00
Hye Soo Yang
20196d5398 Add highway c++ library as a dep to TensorFlow
PiperOrigin-RevId: 565237467
2023-09-13 20:22:29 -07:00
Ryan M. Lefever
d8a17da3f0 Change 3/6 for making MSA repacking slice aware.
Changed SlicedAllocationFinder to
- accept a method to determine if allocations a permitted to begin at a given offset
- expose a method to test if a sliced allocation can fit at a specific offset

PiperOrigin-RevId: 565231151
2023-09-13 19:45:35 -07:00
Jake Harmon
ebe4f498df Fix comment bug in tsl's clean_dep
PiperOrigin-RevId: 565223624
2023-09-13 18:57:41 -07:00
Ryan M. Lefever
0d333c8ae0 Change 2/6 for making MSA repacking slice aware.
Fix a bug in which we over-allocate space for slices, when they are colocated with larger buffers.

The interaction causing this behavior is as follows:
A) GlobalDecreasingSizeBestFitHeap::FindChunkCandidates() adds additional space to the last chunk in a sliced allocation, to account for max_colocation_size.
B) When AlternateMemoryBestFitHeap::CheckPrefetchFit() computes slices_for_pending_chunks, it recomputes the size of the sliced allocation as the sum of the sizes of the chunks returned from A. Note, we do not recompute the size for the allocation in a non-sliced world.
C) Before committing a chunk, GlobalDecreasingSizeBestFitHeap::CommitChunk() changes the chunk's size to fit the size from B. Thus, in the sliced case we keep the extra max_colocation_size space, since we recalculated the allocation size with it. In the non-sliced case, we adjust the chunk size back to what is needed for the request.

So, this change is a no-op for non-slices.

PiperOrigin-RevId: 565217603
2023-09-13 18:23:50 -07:00
James Mullenbach
4404175d0d Add configurable retries for SetServerDef, for fault tolerance amidst preemptions.
There's a short period during ParameterServerStrategy initialization / cluster connection in which worker preemptions will lead to UnavailableErrors from CreateContext calls. This adds configurable retries to SetServerDef so that a single connection failure does not stop the whole job. Retries will be enabled as the default behavior for PSS in a followup change.

PiperOrigin-RevId: 565214961
2023-09-13 18:13:05 -07:00
A. Unique TensorFlower
75fb8c8e8c #tf-data Provide autotune with fresh values for cpu/ram budget
The loop that runs Autotune will fetch current values for available CPU and RAM on each iteration. This helps in situations where the hardware resources available to tf.data may be vertically scaled up or down based on usage during the process' lifetime.

PiperOrigin-RevId: 565197940
2023-09-13 16:50:56 -07:00
Fergus Henderson
7cf3460cd1 Add missing backquotes in a couple of places in the release notes.
PiperOrigin-RevId: 565191065
2023-09-13 16:27:18 -07:00
Yu Feng
96d793172d Open source mesh_util_test.py
PiperOrigin-RevId: 565189870
2023-09-13 16:19:34 -07:00
Clive Verghese
caac4ac308 Add 5c9f72faadaca7250b341b99da358e855a8d902e from abseil-cpp.
PiperOrigin-RevId: 565187417
2023-09-13 16:12:21 -07:00
Fergus Henderson
310715d91f Add a test using the FlatBuffer C API (rather than the FlatBuffer C++ API)
to construct the TFLiteSettings FlatBuffer.

PiperOrigin-RevId: 565184361
2023-09-13 15:56:42 -07:00
Son Tuan Vu
e54cae4089 [XLA:GPU] Limit unroll factor for column reductions
Vectorized column reductions might exceed shmem budget. Limit the unroll factors to avoid this.

PiperOrigin-RevId: 565170403
2023-09-13 15:25:44 -07:00
TensorFlower Gardener
adcfd3f69c Merge pull request #61809 from terryheo:use-ndk-r26
PiperOrigin-RevId: 565170354
2023-09-13 15:19:18 -07:00
Swachhand Lokhande
f80b7460db Use PjRtFuture returned by ExecutePortable to extend the lifetime of PjRtBuffers.
The owned PjRtBuffers in `owned_executable_args` need to live until execution is complete. Currently this is achieved by blocking until all the executable outputs are ready. However, this seemed to cause performance overheads, see b/299683272 and b/300102691.

With this change, we don't block until execution is complete. The ownership of `owned_executable_args` is moved to a lambda which is executed as a callback when the PjRtFuture returned by ExecutePortable is ready (which happens when the execution is complete).

PiperOrigin-RevId: 565169152
2023-09-13 15:10:57 -07:00
Hye Soo Yang
92691ae0ac Open source op for GetMinibatchesInCsrWithPhysicalReplica for SparseCore.
PiperOrigin-RevId: 565168980
2023-09-13 15:06:49 -07:00
Yu Feng
4f75b24b0f UnimplementedError prints the type name.
Such that users can act on the classes, adding the override methods there.

PiperOrigin-RevId: 565168942
2023-09-13 15:00:50 -07:00
A. Unique TensorFlower
59e2bcf692 [XLA] Add WithReplicaGroups in pattern matcher and modify tests to conform to the new pattern matching format
-Add WithReplicaGroups implementation for HloInstructionPattern to match with the collective instruction's replica groups.

PiperOrigin-RevId: 565160403
2023-09-13 14:36:25 -07:00
Hye Soo Yang
9331ee5476 Adding sparse_core_ops_stats_handler for recording metrics
PiperOrigin-RevId: 565158613
2023-09-13 14:29:32 -07:00
Yishuang Pang
b090b29c0a Update CocoaPods specs for TFLite 2.13.0
PiperOrigin-RevId: 565157773
2023-09-13 14:21:30 -07:00
Hye Soo Yang
b99dffb460 Open source kernel for GetMinibatchesInCsrWithPhysicalReplicaOp for SparseCore.
Open source sparse_core_ops_utils*

PiperOrigin-RevId: 565156105
2023-09-13 14:13:45 -07:00
Son Tuan Vu
67c0625e50 [XLA:GPU] Fix theoretical bug where shmem_usage > shmem_budget
PiperOrigin-RevId: 565152217
2023-09-13 14:02:44 -07:00
A. Unique TensorFlower
6ed6cf087d Remove PluginConfig class. It was always set to PluginConfig::kDefault.
PiperOrigin-RevId: 565140913
2023-09-13 13:23:44 -07:00
A. Unique TensorFlower
646de0c120 Defines an interface for makespan evaluation.
PiperOrigin-RevId: 565134913
2023-09-13 13:03:26 -07:00
Matt Callanan
96591d0419 #tf-data Scale up "inject_io_prefetch" experiment to 50% job level.
PiperOrigin-RevId: 565132363
2023-09-13 12:53:27 -07:00
A. Unique TensorFlower
5c004b4b94 Adds a new cost component for makespan.
PiperOrigin-RevId: 565121864
2023-09-13 12:14:13 -07:00
A. Unique TensorFlower
65e5b764a4 [XLA:GPU] Add missing tolerance for BF16 tests in Triton Softmax tests.
PiperOrigin-RevId: 565119660
2023-09-13 12:06:12 -07:00
Grant Jensen
74a206527e [tflite-gpu] Push select_v2 dim check up from inference to parser.
PiperOrigin-RevId: 565113938
2023-09-13 11:47:13 -07:00
A. Unique TensorFlower
c91917d7dc Use tf2xla implementation of SliceOp instead of MLIR.
PiperOrigin-RevId: 565111478
2023-09-13 11:39:29 -07:00
TensorFlower Gardener
56edb7fc3b Merge pull request #58400 from SaoirseARM:toupstream/int6x8_32_accum
PiperOrigin-RevId: 565106502
2023-09-13 11:27:45 -07:00
Hye Soo Yang
66dbd1d599 Open source global_iter_id.cc for SparseCore.
PiperOrigin-RevId: 565097655
2023-09-13 10:54:59 -07:00
TensorFlower Gardener
3ff7a3ba05 Merge pull request #61634 from 0o001:master
PiperOrigin-RevId: 565094585
2023-09-13 10:47:25 -07:00
Jieying Luo
26aa3c84c6 Excludes building pjrt_c_api_gpu_plugin.so (GPU only target) on MAC.
PiperOrigin-RevId: 565092976
2023-09-13 10:40:49 -07:00
Marcello Maggioni
be0d1151a6 [XLA] Rework dot() sharding propagation to lookahead instructions sharding to choose a sharding for dot() that agrees with the users if possible.
PiperOrigin-RevId: 565086052
2023-09-13 10:19:33 -07:00
Oleg Shyshkov
d11423a4e9 [mhlo] Remove unused HloLegalizeToLhlo pass.
PiperOrigin-RevId: 565083641
2023-09-13 10:11:38 -07:00
Marcello Maggioni
f28e73538a [XLA] Add support to CollectivePipeliner to sink collectives.
Small collectives might be better off when sinked and there are other potnential use cases
Also fix a bug, where we were accepting reuse of the data that we were storing and changing the tests using that pattern to match the fix.

PiperOrigin-RevId: 565080772
2023-09-13 10:02:14 -07:00
Bixia Zheng
65bd69126a [xla] Change the collective-permute-decomposer to not chain the Send and Recv
instructions through control dependence.

This is because the generated HLO program is correct even without the control
dependence chaining. The purpose of the control dependence chaining is to
support a scheduler, such as the latency hiding scheduler, and thus will be
added to the latency hiding scheduler preparation pass. Not producing the
control dependence chaining while decomposing collective-permute can also
simplify the implementation of collective-pipeliner in pipelining Send and
Recv instructions.

PiperOrigin-RevId: 565073772
2023-09-13 09:38:54 -07:00
Benjamin Kramer
74d7ec3be9 Integrate LLVM at llvm/llvm-project@8ebe1d1cc1
Updates LLVM usage to match
[8ebe1d1cc1e4](https://github.com/llvm/llvm-project/commit/8ebe1d1cc1e4)

PiperOrigin-RevId: 565069541
2023-09-13 09:25:33 -07:00