Now python3.6 can be installed from apt and it will be installed with
all submodules.
If we're compiling Python from source, during compilation we get:
```
The necessary bits to build these optional modules were not found:
_bz2 _dbm _gdbm
_lzma _sqlite3 _tkinter
readline
```
which then results in
```
==================== Test output for //bazel_pip/tensorflow/contrib/summary:summary_ops_test:
Running test /tmpfs/src/github/tensorflow/bazel-ci_build-cache/.cache/bazel/_bazel_kbuilder/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/bazel_pip/tensorflow/contrib/summary/summary_ops_test.runfiles/org_tensorflow/bazel_pip/tensorflow/contrib/summary/summary_ops_test on GPU 0
Traceback (most recent call last):
File "/tmpfs/src/github/tensorflow/bazel-ci_build-cache/.cache/bazel/_bazel_kbuilder/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/bazel_pip/tensorflow/contrib/summary/summary_ops_test.runfiles/org_tensorflow/bazel_pip/tensorflow/contrib/summary/summary_ops_test.py", line 23, in <module>
import sqlite3
File "/usr/local/lib/python3.6/sqlite3/__init__.py", line 23, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.6/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
================================================================================
```
and similar failures which then block releasing patch version.
* [XLA] Update Tf2Xla bridge to use Scatter HLO.
PiperOrigin-RevId: 215687800
* [XLA:GPU] Add an implementation of scatter for GPU
This simple has a kernel that runs on every element of the updates tensor,
figure out the right indices to perform the update, and applies it with an
atomic operation.
Currently we emit a CAS for plain (i.e. non-add) updates, which is inefficient.
Also TuplePointsToAnalysis doesn't know that it should alias the operand and
output buffers of a scatter, which would avoid a copy.
PiperOrigin-RevId: 216412467
* [XLA] Allow scatter to share the operand buffer with the output
This avoids a copy.
PiperOrigin-RevId: 216437329
* [XLA:GPU] Elide the SequentialThunk when emitting scatter with no copy
We have a 1-element thunk sequence if we're not copying. That's still two
thunks and hlo profiling gets confused if it sees two thunks for the same
instruction and one of them claims to be the whole instruction.
PiperOrigin-RevId: 216448063
* [XLA:GPU] Allow input fusion into scatter
We fuse everything into the scatter now, and emit two kernels. The first kernel
fills the output buffer with the computation fused into the scatter operand.
The second kernel is a regular scatter, which also contains the fused
operations from the updates and scatter_indices inputs.
PiperOrigin-RevId: 216624225
* [XLA:GPU] Adding a test case for Scatter where GPU implementation fails.
PiperOrigin-RevId: 216798034
* [XLA:GPU] Fix scatter oob check computation
This was comparing the index after adding it to the window, and then comparing
against the window dimension. This means that the bounds check was only correct
for the first element of a window. Instead compare the scatter index, which is
the same for all elements of a window.
PiperOrigin-RevId: 216921512
* [XLA:GPU] Elide tuple roots of the entry computation
The tuple buffer is never read, so stop emitting code to fill it. A typical
root tuple consists of a H2D memcpy and a host callback, both of which are
somewhat slow.
This helps tiny models and inference benchmarks, where the host/device syncs
can be a significant part of the runtime of the entire computation.
PiperOrigin-RevId: 216968475