tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-07 12:20:24 +01:00

Author	SHA1	Message	Date
Ilia Sergachev	0a03fa6be5	[XLA:GPU] Triton GEMM rewriter: fix wrong check. Fix forward for cl/566274725: IsEqualAt is templated and was not written correctly. The check was not related to the CL anyway. PiperOrigin-RevId: 566298256	2023-09-18 07:57:29 -07:00
Michael Hudgins	191cf82bd6	Integrate in the linaro fork of the TF docker	2023-09-18 14:52:45 +00:00
kushanam	4bc874a3ac	PR #5300 : A new pass to optimize the AllGather->Binary_Op order sequence Imported from GitHub PR https://github.com/openxla/xla/pull/5300 This is a new GPU SPMD optimization pass for the following pattern: binary-op(all-gather(a), all-gather(b)) to all-gather(binary-op(a, b)) Copybara import of the project: -- 77aafc0686fb98a6e13b6664ee537ed3cde5e24f by kushanam <kahmadian@nvidia.com>: adding a new pass to optimize reduce_scatter->all_gather->binary_op sequence -- 0b1e8eb599f8a7334b7c9826746db67e0923f2f7 by kushanam <kahmadian@nvidia.com>: applying review refactors -- 9b181ec7487e7ded4610a779f8929d2e2a199e0d by kushanam <kahmadian@nvidia.com>: removing reduce-scatter from the all-gather optimization -- a8c49eb58f3b370627cd57c62f456696567ba60a by kushanam <kahmadian@nvidia.com>: remove traversal all-gather search and rely on immediate parent -- d90f5a148bc099455724450b84f1af8fb83ffc66 by kushanam <kahmadian@nvidia.com>: remove extra gpu word from the directive Merging this change closes #5300 PiperOrigin-RevId: 566298114	2023-09-18 07:49:21 -07:00
Chao	7805d33bf5	PR #5670 : [ROCm] revert 48cf922 Imported from GitHub PR https://github.com/openxla/xla/pull/5670 revert ROCm side same as `5eb7734505` @anlunx Thanks in advance! Copybara import of the project: -- 9dedb1ce2a620bae69c0fbaa8e5822ababfd52bc by Chao Chen <cchen104@amd.com>: ROCm revert 48cf922 Merging this change closes #5670 PiperOrigin-RevId: 566284041	2023-09-18 06:39:58 -07:00
Sergey Kozub	bc6d7a4843	Fix atomic max-reduce edge case (negative zero) PiperOrigin-RevId: 566278326	2023-09-18 06:09:56 -07:00
Ilia Sergachev	ac60f49d69	[XLA:GPU] Triton GEMM: support slicing of inputs. With this logic slices will be fused quite rarely (only as inputs, either slices of direct computation parameters or tiny ones) because generally slices are better to fuse to producers to reduce DRAM traffic. PiperOrigin-RevId: 566274725	2023-09-18 05:57:59 -07:00
A. Unique TensorFlower	f6e9b54378	Integrate LLVM at llvm/llvm-project@79e96b2457 Updates LLVM usage to match [79e96b2457fe](https://github.com/llvm/llvm-project/commit/79e96b2457fe) PiperOrigin-RevId: 566274600	2023-09-18 05:49:38 -07:00
Alan Kelly	edd09bcc36	Check error code after quantizing PiperOrigin-RevId: 566270089	2023-09-18 05:22:32 -07:00
Ilia Sergachev	7d381bd7c6	[XLA:GPU] Fix build of HLO op profiler tool. Add missing return statement, fix access to device name. PiperOrigin-RevId: 566253614	2023-09-18 03:53:58 -07:00
A. Unique TensorFlower	505c9f544c	compat: Update forward compatibility horizon to 2023-09-18 PiperOrigin-RevId: 566233520	2023-09-18 02:17:55 -07:00
A. Unique TensorFlower	d9b1d26061	Update GraphDef version to 1623. PiperOrigin-RevId: 566233510	2023-09-18 02:11:23 -07:00
Tamás Danyluk	25f87b72ba	[XLA:GPU][NFC] Add CreateMlirModuleOp as a leak-safe alternative of mlir::ModuleOp::create mlir::ModuleOp::create returns a non-owning reference which is almost never the intended usage. We may leak memory if we don't assign it manually to an mlir::OwningOpRef. We actually had an error like this a few weeks ago. CreateMlirModuleOp returns an owning reference by default. I added a check to our internal presubmit, which will fail for mlir::ModuleOp::create calls. We can opt-out of the check by adding /ALLOW_MLIR_MODULE_OP_CREATE/ to the same line as mlir::ModuleOp::create. I recommend only doing this if really needed and doing this in an utility function not in general code. PiperOrigin-RevId: 566231363	2023-09-18 01:57:43 -07:00
Son Tuan Vu	5b93ae202b	[XLA:GPU][NFC] Remove unused allocator argument from conv autotuning PiperOrigin-RevId: 566228813	2023-09-18 01:41:44 -07:00
Tamás Danyluk	424006ba48	[XLA::GPU] Do not keep TargetMachine alive after finishing executable compilation This saves host RAM during compilation. PiperOrigin-RevId: 566225133	2023-09-18 01:22:57 -07:00
Jiyoun (Jen) Ha	5fd22c15c2	(1/N) Refactor stablehlo pass namespaces to mlir::quant::stablehlo. PiperOrigin-RevId: 566217901	2023-09-18 00:43:59 -07:00
Jiyoun (Jen) Ha	da646f1b1b	lite:stablehlo:transforms: Add comments for better readability. PiperOrigin-RevId: 566194710	2023-09-17 22:49:44 -07:00
Ziyin Huang	e4a6720f42	clean up the sparse core preprocess ops kernel PiperOrigin-RevId: 566172603	2023-09-17 20:37:06 -07:00
David Majnemer	b2ec3bbcf9	[pjrt] Add support for ARM Neon to transpose The recursive transpose algorithm is pretty fundamental. We can implement it on Neon by just implementing some primitives. While we are here, reduce code bloat by skipping instantiation of unspecialized micro-kernels. PiperOrigin-RevId: 566137150	2023-09-17 15:23:11 -07:00
A. Unique TensorFlower	7a7ecd49df	Fix deadlock in AbstractAsyncHostToHostMemoryTransferManager PiperOrigin-RevId: 566119193	2023-09-17 12:18:00 -07:00
A. Unique TensorFlower	a2fe2d61a5	Update GraphDef version to 1622. PiperOrigin-RevId: 566056331	2023-09-17 02:14:30 -07:00
A. Unique TensorFlower	78a02bf38a	compat: Update forward compatibility horizon to 2023-09-17 PiperOrigin-RevId: 566056330	2023-09-17 02:08:48 -07:00
David Majnemer	1dc1e3cce0	[XLA] Make sure the dynamic sizes in literal are well aligned This ensures that we do not trigger undefined behavior when accessing them. PiperOrigin-RevId: 565980525	2023-09-16 15:17:02 -07:00
A. Unique TensorFlower	fbe41b7246	compat: Update forward compatibility horizon to 2023-09-16 PiperOrigin-RevId: 565896351	2023-09-16 02:18:55 -07:00
A. Unique TensorFlower	9297aa04f1	Update GraphDef version to 1621. PiperOrigin-RevId: 565896297	2023-09-16 02:11:39 -07:00
Hye Soo Yang	4aac3064f4	Open source op & kernel for `GetMinibatchSplitsWithPhysicalReplica` PiperOrigin-RevId: 565837005	2023-09-15 19:49:19 -07:00
Michael Delorimier	162463fd8c	Replicate small constants so they don't need to be sent to their successors. A small constant is replicated to each of its successors' devices. The maximum size of a constant to be replicated is 16 elements. This pass is disabled by default and can be enabled with the flag replicate_small_constants. PiperOrigin-RevId: 565820680	2023-09-15 17:59:10 -07:00
A. Unique TensorFlower	c04d03db88	Use sharding propagation when possible to obtain a default solution to compare with the auto-sharding solution. PiperOrigin-RevId: 565817073	2023-09-15 17:33:27 -07:00
Parker Schuh	13baa8e7e5	Allow serializing env_options_overrides separately from the rest of CompileOptions. PiperOrigin-RevId: 565813066	2023-09-15 17:10:03 -07:00
A. Unique TensorFlower	18ead88de7	Update `HloBufferDonorConfig::Verify` and `HloVerifier`. The overlap between `buffer_donor_config` and `input_output_alias_config` is not allowed. PiperOrigin-RevId: 565803412	2023-09-15 16:24:02 -07:00
Victor Stone	498397da44	Temporarily disable host memory offload of tensors which have a tuple as a direct user. PiperOrigin-RevId: 565802826	2023-09-15 16:16:53 -07:00
A. Unique TensorFlower	91d784d8eb	Refactor `TfrtCpuAsyncHostToDeviceTransferManager` by adding a new intermediate class `AbstractAsyncHostToHostMemoryTransferManager` between `PjRtClient::AsyncHostToDeviceTransferManager` and `TfrtCpuAsyncHostToDeviceTransferManager`. Other clients may use this new class as a shared impl to create their own async host to host/cpu memory transfer managers. PiperOrigin-RevId: 565798541	2023-09-15 15:57:21 -07:00
A. Unique TensorFlower	3f0f23492b	Delete deprecated gpu tf_runtime backend PiperOrigin-RevId: 565792209	2023-09-15 15:28:54 -07:00
Austin Anderson	723c419362	Deduplicate envs by sourcing common settings PiperOrigin-RevId: 565790027	2023-09-15 15:18:54 -07:00
Fiona Lang	9b52021eca	Internal changes only. PiperOrigin-RevId: 565786333	2023-09-15 15:03:38 -07:00
TensorFlower Gardener	d1533803bc	Merge pull request #61300 from jamwar01:fix_tosa_rsqrt_table_diff PiperOrigin-RevId: 565772533	2023-09-15 14:11:09 -07:00
Eugene Zhulenev	6f41a94e65	[xla:gpu] Add runtime3 folder in preparation for runtime consolidation PiperOrigin-RevId: 565768928	2023-09-15 13:55:42 -07:00
A. Unique TensorFlower	5205391a12	Fixed a bug that allocation block has 0 collocation when allocate_reserved_scoped_memory_at_same_offset is set false. This will cause minimalloc and telamalloc repacker to fail. PiperOrigin-RevId: 565763337	2023-09-15 13:33:31 -07:00
Yang Chen	100b23067d	#tf-data-service Add log for snapshot timing. PiperOrigin-RevId: 565754578	2023-09-15 12:57:09 -07:00
Yang Chen	6107baae3a	#tf-data-service Instrument dispatcher RPCs. PiperOrigin-RevId: 565751908	2023-09-15 12:46:35 -07:00
A. Unique TensorFlower	15363a2393	[mlir][sparse][xla] Legalize sparse_tensor::Pack/UnpackOp to custom calls before the translation to HLO. PiperOrigin-RevId: 565749097	2023-09-15 12:36:56 -07:00
Yang Chen	4905187e03	#tf-data-service Do not acquire locks when writing splits. PiperOrigin-RevId: 565744967	2023-09-15 12:19:47 -07:00
Yishuang Pang	de701b17dc	Includes `cstdint` in rng_util header. PiperOrigin-RevId: 565741755	2023-09-15 12:06:34 -07:00
Anlun Xu	b7fc45f143	[xla:gpu] GraphExecUpdateResultInfo should be initialized to 0 So that it won't contain garbage values. PiperOrigin-RevId: 565738255	2023-09-15 11:58:22 -07:00
A. Unique TensorFlower	9e2c09a295	Forward `experimental_attributes` to wrapped function creation calls. PiperOrigin-RevId: 565736950	2023-09-15 11:51:35 -07:00
Chao	e54c8969d1	PR #5634 : [ROCm] Fixed plugin config error Imported from GitHub PR https://github.com/openxla/xla/pull/5634 Fixed ROCm build error due to `7be97ae6ea` forgot to include ROCm change. Thanks in advance! @tdanyluk @cheshire Copybara import of the project: -- 9ea7cbda5746cab11348246ebe5b343a80a0f373 by Chao Chen <cchen104@amd.com>: rocm updated graph api and fixed hlo_op_profiler_test -- d5576d44459bed0424fb9c1dad57285562889354 by Chao Chen <cchen104@amd.com>: fixed PluginConfig error Merging this change closes #5634 PiperOrigin-RevId: 565732648	2023-09-15 11:38:03 -07:00
Ce Zheng	57c7009bd7	[XLA] Fallback to V1 sharding when it's not possible to preserve V2 in PartialTile. PiperOrigin-RevId: 565723524	2023-09-15 11:14:52 -07:00
Derek Murray	2ab6129162	Optimize the performance of `ConvertToCooTensorOp`. This CL combines several optimizations: 1. If the combiner is "sum", we avoid all computation and allocation related to gain-rescaling. 2. If the weights are a scalar, we broadcast the same weight to all tokens. This will avoid the need to execute a `Shape`->`Fill` to generate a uniform-weight vector. 3. We use an array instead of a `Tensor` to store the temporary vector of rows. 4. A minor improvement to the code for extracting the row IDs from a `SparseTensor`. PiperOrigin-RevId: 565721609	2023-09-15 11:05:53 -07:00
Yishuang Pang	03dd812404	Undo the MHLO::BroadcastInDimOp folding pattern on splat tensor. This helps reduce model size. This pass slightly changes the patterns for average pooling: ``` 1. div(reduce_window(add), const_divisor) -> div(reduce_window(add), broadcast_in_dim(const_divisor)) 2. div(reduce_window(add), reduce_window(const_1, init_value_0)) -> div(reduce_window(add), reduce_window(broadcast_in_dim(const_1), init_value_0)) ``` PiperOrigin-RevId: 565721299	2023-09-15 10:59:52 -07:00
A. Unique TensorFlower	a074d94639	Internal Code Change PiperOrigin-RevId: 565719155	2023-09-15 10:52:13 -07:00
A. Unique TensorFlower	a0943fb1d9	[XLA] Add a smoke test for the sort comparator. The comparator needs to satisfy the strict weak ordering requirement, otherwise std::sort() may crash. We can't really robustly verify this without considering all triples, but at least we can smoke-test on the first element that the comparator is not reflexive. PiperOrigin-RevId: 565716297	2023-09-15 10:40:31 -07:00

1 2 3 4 5 ...

153857 Commits