Layouts are displayed as e.g. "f32[100,200]{0,1}". But constants used
to be displayed as e.g. "f32[]{42}". To avoid ambiguity, constants are
now displayed as e.g. "42 (f32[])".
Also gets rid of the xla_hlo_graph_layout flag, which is no longer
necessary since we're now showing layouts unconditionally.
PiperOrigin-RevId: 163753637
TF_Tensor's are backed by a contiguous memory region for all
but TF_RESOURCE tensors. The memory management of TF_RESOURCE
tensors required keeping a backing tensorflow::ResourceHandle*
object alive for the lifetime of the TF_Tensor object.
This change removes that discrepancy, making the memory backing
TF_RESOURCE tensors self-contained. This simplifies use of TF_RESOURCE
tensors in the C API (as users of the C API do not need to worry about
a tensorflow::ResourceHandle object and its lifetime). In doing so, this
moves a string memory copy from the TF_Tensor <-> Numpy conversion
to the C API from the Python session helper.
Unfortunately, I couldn't figure out how to add a simple unittest in
c_api_test.cc. The more comprehensive tensorflow/python/kernel_tests/session_ops_test.py
does cover the changed lines though.
Additionally, avoid an unnecessary copy when creating TF_STRING or TF_RESOURCE
tensors (as eigen alignment is not a requirement for them).
PiperOrigin-RevId: 163751880
It now takes about 400ms rather than 800ms, if the file system cache is warm.
Most of the latency was due to parsing text_format OpList protocol buffers in
our generated sources. We now use a binary representation, while preserving the
text proto as a comment for readability.
Note: This change does not improve the latency of dereferencing tf.contrib,
which takes about 340ms.
PiperOrigin-RevId: 163739355
* Upgrade pip version used in virtualenv created by the test-on-install to latest (9.0.1).
* Highlight step titles of pip builds with bold font.
PiperOrigin-RevId: 163732825
Improves precision of double precision numerical gradients by using a smaller step size delta (the optimal for symmetric difference approximation with functions computed with O(epsilon) error is epsilon^(1/3), so for double64 it is ~1e-5).
PiperOrigin-RevId: 163706297
This implementation expands the depthwise convolution kernels into a regular convolution kernel, which may not scale to large feature depths.
PiperOrigin-RevId: 163705408
This is a simplification, but also: The ExecuteAsync makes it easy to do
the wrong thing.
ExecuteAsync lets you easily start a computation and then infeed/outfeed
to it, all without starting any threads yourself.
But in practice, if you're using infeed or outfeed, you're probably
using both. For good performance, you should overlap infeeds and
outfeeds, which means you need to run them on separate threads. But
then you might as well use a thread for Execute, as well.
PiperOrigin-RevId: 163509892