Commit Graph

153857 Commits

Author SHA1 Message Date
Ilia Sergachev
0a03fa6be5 [XLA:GPU] Triton GEMM rewriter: fix wrong check.
Fix forward for cl/566274725: IsEqualAt is templated and was not written correctly. The check was not related to the CL anyway.

PiperOrigin-RevId: 566298256
2023-09-18 07:57:29 -07:00
Michael Hudgins
191cf82bd6 Integrate in the linaro fork of the TF docker 2023-09-18 14:52:45 +00:00
kushanam
4bc874a3ac PR #5300: A new pass to optimize the AllGather->Binary_Op order sequence
Imported from GitHub PR https://github.com/openxla/xla/pull/5300

This is a new GPU SPMD optimization pass for the following pattern:
binary-op(all-gather(a), all-gather(b))
to
all-gather(binary-op(a, b))

Copybara import of the project:

--
77aafc0686fb98a6e13b6664ee537ed3cde5e24f by kushanam <kahmadian@nvidia.com>:

adding a new pass to optimize reduce_scatter->all_gather->binary_op sequence

--
0b1e8eb599f8a7334b7c9826746db67e0923f2f7 by kushanam <kahmadian@nvidia.com>:

applying review refactors

--
9b181ec7487e7ded4610a779f8929d2e2a199e0d by kushanam <kahmadian@nvidia.com>:

removing reduce-scatter from the all-gather optimization

--
a8c49eb58f3b370627cd57c62f456696567ba60a by kushanam <kahmadian@nvidia.com>:

remove traversal all-gather search and rely on immediate parent

--
d90f5a148bc099455724450b84f1af8fb83ffc66 by kushanam <kahmadian@nvidia.com>:

remove extra gpu word from the directive

Merging this change closes #5300

PiperOrigin-RevId: 566298114
2023-09-18 07:49:21 -07:00
Chao
7805d33bf5 PR #5670: [ROCm] revert 48cf922
Imported from GitHub PR https://github.com/openxla/xla/pull/5670

revert ROCm side same as 5eb7734505

@anlunx Thanks in advance!
Copybara import of the project:

--
9dedb1ce2a620bae69c0fbaa8e5822ababfd52bc by Chao Chen <cchen104@amd.com>:

ROCm revert 48cf922

Merging this change closes #5670

PiperOrigin-RevId: 566284041
2023-09-18 06:39:58 -07:00
Sergey Kozub
bc6d7a4843 Fix atomic max-reduce edge case (negative zero)
PiperOrigin-RevId: 566278326
2023-09-18 06:09:56 -07:00
Ilia Sergachev
ac60f49d69 [XLA:GPU] Triton GEMM: support slicing of inputs.
With this logic slices will be fused quite rarely (only as inputs, either slices of direct computation parameters or tiny ones) because generally slices are better to fuse to producers to reduce DRAM traffic.

PiperOrigin-RevId: 566274725
2023-09-18 05:57:59 -07:00
A. Unique TensorFlower
f6e9b54378 Integrate LLVM at llvm/llvm-project@79e96b2457
Updates LLVM usage to match
[79e96b2457fe](https://github.com/llvm/llvm-project/commit/79e96b2457fe)

PiperOrigin-RevId: 566274600
2023-09-18 05:49:38 -07:00
Alan Kelly
edd09bcc36 Check error code after quantizing
PiperOrigin-RevId: 566270089
2023-09-18 05:22:32 -07:00
Ilia Sergachev
7d381bd7c6 [XLA:GPU] Fix build of HLO op profiler tool.
Add missing return statement, fix access to device name.

PiperOrigin-RevId: 566253614
2023-09-18 03:53:58 -07:00
A. Unique TensorFlower
505c9f544c compat: Update forward compatibility horizon to 2023-09-18
PiperOrigin-RevId: 566233520
2023-09-18 02:17:55 -07:00
A. Unique TensorFlower
d9b1d26061 Update GraphDef version to 1623.
PiperOrigin-RevId: 566233510
2023-09-18 02:11:23 -07:00
Tamás Danyluk
25f87b72ba [XLA:GPU][NFC] Add CreateMlirModuleOp as a leak-safe alternative of mlir::ModuleOp::create
mlir::ModuleOp::create returns a non-owning reference which is almost never the intended usage. We may leak memory if we don't assign it manually to an mlir::OwningOpRef. We actually had an error like this a few weeks ago.

CreateMlirModuleOp returns an owning reference by default.

I added a check to our internal presubmit, which will fail for mlir::ModuleOp::create calls.

We can opt-out of the check by adding /*ALLOW_MLIR_MODULE_OP_CREATE*/ to the same line as mlir::ModuleOp::create. I recommend only doing this if really needed and doing this in an utility function not in general code.

PiperOrigin-RevId: 566231363
2023-09-18 01:57:43 -07:00
Son Tuan Vu
5b93ae202b [XLA:GPU][NFC] Remove unused allocator argument from conv autotuning
PiperOrigin-RevId: 566228813
2023-09-18 01:41:44 -07:00
Tamás Danyluk
424006ba48 [XLA::GPU] Do not keep TargetMachine alive after finishing executable compilation
This saves host RAM during compilation.

PiperOrigin-RevId: 566225133
2023-09-18 01:22:57 -07:00
Jiyoun (Jen) Ha
5fd22c15c2 (1/N) Refactor stablehlo pass namespaces to mlir::quant::stablehlo.
PiperOrigin-RevId: 566217901
2023-09-18 00:43:59 -07:00
Jiyoun (Jen) Ha
da646f1b1b lite:stablehlo:transforms: Add comments for better readability.
PiperOrigin-RevId: 566194710
2023-09-17 22:49:44 -07:00
Ziyin Huang
e4a6720f42 clean up the sparse core preprocess ops kernel
PiperOrigin-RevId: 566172603
2023-09-17 20:37:06 -07:00
David Majnemer
b2ec3bbcf9 [pjrt] Add support for ARM Neon to transpose
The recursive transpose algorithm is pretty fundamental. We can implement it on Neon by just implementing some primitives.

While we are here, reduce code bloat by skipping instantiation of unspecialized micro-kernels.

PiperOrigin-RevId: 566137150
2023-09-17 15:23:11 -07:00
A. Unique TensorFlower
7a7ecd49df Fix deadlock in AbstractAsyncHostToHostMemoryTransferManager
PiperOrigin-RevId: 566119193
2023-09-17 12:18:00 -07:00
A. Unique TensorFlower
a2fe2d61a5 Update GraphDef version to 1622.
PiperOrigin-RevId: 566056331
2023-09-17 02:14:30 -07:00
A. Unique TensorFlower
78a02bf38a compat: Update forward compatibility horizon to 2023-09-17
PiperOrigin-RevId: 566056330
2023-09-17 02:08:48 -07:00
David Majnemer
1dc1e3cce0 [XLA] Make sure the dynamic sizes in literal are well aligned
This ensures that we do not trigger undefined behavior when accessing them.

PiperOrigin-RevId: 565980525
2023-09-16 15:17:02 -07:00
A. Unique TensorFlower
fbe41b7246 compat: Update forward compatibility horizon to 2023-09-16
PiperOrigin-RevId: 565896351
2023-09-16 02:18:55 -07:00
A. Unique TensorFlower
9297aa04f1 Update GraphDef version to 1621.
PiperOrigin-RevId: 565896297
2023-09-16 02:11:39 -07:00
Hye Soo Yang
4aac3064f4 Open source op & kernel for GetMinibatchSplitsWithPhysicalReplica
PiperOrigin-RevId: 565837005
2023-09-15 19:49:19 -07:00
Michael Delorimier
162463fd8c Replicate small constants so they don't need to be sent to their successors. A small constant is replicated to each of its successors' devices. The maximum size of a constant to be replicated is 16 elements.
This pass is disabled by default and can be enabled with the flag replicate_small_constants.

PiperOrigin-RevId: 565820680
2023-09-15 17:59:10 -07:00
A. Unique TensorFlower
c04d03db88 Use sharding propagation when possible to obtain a default solution to compare with the auto-sharding solution.
PiperOrigin-RevId: 565817073
2023-09-15 17:33:27 -07:00
Parker Schuh
13baa8e7e5 Allow serializing env_options_overrides separately from the rest of
CompileOptions.

PiperOrigin-RevId: 565813066
2023-09-15 17:10:03 -07:00
A. Unique TensorFlower
18ead88de7 Update HloBufferDonorConfig::Verify and HloVerifier. The overlap between buffer_donor_config and input_output_alias_config is not allowed.
PiperOrigin-RevId: 565803412
2023-09-15 16:24:02 -07:00
Victor Stone
498397da44 Temporarily disable host memory offload of tensors which have a tuple as a direct user.
PiperOrigin-RevId: 565802826
2023-09-15 16:16:53 -07:00
A. Unique TensorFlower
91d784d8eb Refactor TfrtCpuAsyncHostToDeviceTransferManager by adding a new intermediate class AbstractAsyncHostToHostMemoryTransferManager between PjRtClient::AsyncHostToDeviceTransferManager and TfrtCpuAsyncHostToDeviceTransferManager. Other clients may use this new class as a shared impl to create their own async host to host/cpu memory transfer managers.
PiperOrigin-RevId: 565798541
2023-09-15 15:57:21 -07:00
A. Unique TensorFlower
3f0f23492b Delete deprecated gpu tf_runtime backend
PiperOrigin-RevId: 565792209
2023-09-15 15:28:54 -07:00
Austin Anderson
723c419362 Deduplicate envs by sourcing common settings
PiperOrigin-RevId: 565790027
2023-09-15 15:18:54 -07:00
Fiona Lang
9b52021eca Internal changes only.
PiperOrigin-RevId: 565786333
2023-09-15 15:03:38 -07:00
TensorFlower Gardener
d1533803bc Merge pull request #61300 from jamwar01:fix_tosa_rsqrt_table_diff
PiperOrigin-RevId: 565772533
2023-09-15 14:11:09 -07:00
Eugene Zhulenev
6f41a94e65 [xla:gpu] Add runtime3 folder in preparation for runtime consolidation
PiperOrigin-RevId: 565768928
2023-09-15 13:55:42 -07:00
A. Unique TensorFlower
5205391a12 Fixed a bug that allocation block has 0 collocation when allocate_reserved_scoped_memory_at_same_offset is set false. This will cause minimalloc and telamalloc repacker to fail.
PiperOrigin-RevId: 565763337
2023-09-15 13:33:31 -07:00
Yang Chen
100b23067d #tf-data-service Add log for snapshot timing.
PiperOrigin-RevId: 565754578
2023-09-15 12:57:09 -07:00
Yang Chen
6107baae3a #tf-data-service Instrument dispatcher RPCs.
PiperOrigin-RevId: 565751908
2023-09-15 12:46:35 -07:00
A. Unique TensorFlower
15363a2393 [mlir][sparse][xla] Legalize sparse_tensor::Pack/UnpackOp to custom calls before the translation to HLO.
PiperOrigin-RevId: 565749097
2023-09-15 12:36:56 -07:00
Yang Chen
4905187e03 #tf-data-service Do not acquire locks when writing splits.
PiperOrigin-RevId: 565744967
2023-09-15 12:19:47 -07:00
Yishuang Pang
de701b17dc Includes cstdint in rng_util header.
PiperOrigin-RevId: 565741755
2023-09-15 12:06:34 -07:00
Anlun Xu
b7fc45f143 [xla:gpu] GraphExecUpdateResultInfo should be initialized to 0
So that it won't contain garbage values.

PiperOrigin-RevId: 565738255
2023-09-15 11:58:22 -07:00
A. Unique TensorFlower
9e2c09a295 Forward experimental_attributes to wrapped function creation calls.
PiperOrigin-RevId: 565736950
2023-09-15 11:51:35 -07:00
Chao
e54c8969d1 PR #5634: [ROCm] Fixed plugin config error
Imported from GitHub PR https://github.com/openxla/xla/pull/5634

Fixed ROCm build error due to 7be97ae6ea forgot to include ROCm change.

Thanks in advance! @tdanyluk @cheshire
Copybara import of the project:

--
9ea7cbda5746cab11348246ebe5b343a80a0f373 by Chao Chen <cchen104@amd.com>:

rocm updated graph api and fixed hlo_op_profiler_test

--
d5576d44459bed0424fb9c1dad57285562889354 by Chao Chen <cchen104@amd.com>:

fixed PluginConfig error

Merging this change closes #5634

PiperOrigin-RevId: 565732648
2023-09-15 11:38:03 -07:00
Ce Zheng
57c7009bd7 [XLA] Fallback to V1 sharding when it's not possible to preserve V2 in PartialTile.
PiperOrigin-RevId: 565723524
2023-09-15 11:14:52 -07:00
Derek Murray
2ab6129162 Optimize the performance of ConvertToCooTensorOp.
This CL combines several optimizations:

1. If the combiner is "sum", we avoid all computation and allocation related to gain-rescaling.
2. If the weights are a scalar, we broadcast the same weight to all tokens. This will avoid the need to execute a `Shape`->`Fill` to generate a uniform-weight vector.
3. We use an array instead of a `Tensor` to store the temporary vector of rows.
4. A minor improvement to the code for extracting the row IDs from a `SparseTensor`.

PiperOrigin-RevId: 565721609
2023-09-15 11:05:53 -07:00
Yishuang Pang
03dd812404 Undo the MHLO::BroadcastInDimOp folding pattern on splat tensor. This helps reduce model size. This pass slightly changes the patterns for average pooling:
```
1. div(reduce_window(add), const_divisor) -> div(reduce_window(add), broadcast_in_dim(const_divisor))
2. div(reduce_window(add), reduce_window(const_1, init_value_0)) -> div(reduce_window(add), reduce_window(broadcast_in_dim(const_1), init_value_0))
```

PiperOrigin-RevId: 565721299
2023-09-15 10:59:52 -07:00
A. Unique TensorFlower
a074d94639 Internal Code Change
PiperOrigin-RevId: 565719155
2023-09-15 10:52:13 -07:00
A. Unique TensorFlower
a0943fb1d9 [XLA] Add a smoke test for the sort comparator.
The comparator needs to satisfy the strict weak ordering requirement, otherwise std::sort() may crash.

We can't really robustly verify this without considering all triples, but at least we can smoke-test on the first element that the comparator is not reflexive.

PiperOrigin-RevId: 565716297
2023-09-15 10:40:31 -07:00