A race condition may occur between the execution of transaction commit,
and an execution of a KILL statement that would attempt to abort that
transaction.
MDEV-17092 worked around this race condition by modifying InnoDB code.
After that issue was closed, Sergey Vojtovich pointed out that this
race condition would better be fixed above the storage engine layer:
If you look carefully into the above, you can conclude that
thd->free_connection() can be called concurrently with
KILL/thd->awake(). Which is the bug. And it is partially fixed in
THD::~THD(), that is destructor waits for KILL completion:
Fix: Add necessary mutex operations to THD::free_connection()
and move WSREP specific code also there. This ensures that no
one is using THD while we do free_connection(). These mutexes
will also ensures that there can't be concurrent KILL/THD::awake().
innobase_kill_query
We can now remove usage of trx_sys_mutex introduced on MDEV-17092.
trx_t::free()
Poison trx->state and trx->mysql_thd
This patch is validated with an RQG run similar to the one that
reproduced MDEV-17092.
This bug could cause a crash when executing queries that used mutually
recursive CTEs with system variable big_tables set to 1. It happened due
to several bugs in the code that handled recursive table references
referred mutually recursive CTEs. For each recursive table reference a
temporary table is created that contains all rows generated for the
corresponding recursive CTE table on the previous step of recursion.
This temporary table should be created in the same way as the temporary
table created for a regular materialized derived table using the
method select_union::create_result_table(). In this case when the
temporary table is created it uses the select_union::TMP_TABLE_PARAM
structure as the parameter for the table construction. However the
code created the temporary table using just the function create_tmp_table()
and passed pointers to certain fields of the TMP_TABLE_PARAM structure
used for accumulation of rows of the recursive CTE table as parameters
for update. This was a mistake because now different temporary tables
cannot share some TMP_TABLE_PARAM fields in a general case. Besides,
depending on how mutually recursive CTE tables were defined and which
of them were referred in the executed query the select_union object
allocated for a recursive table reference could be allocated again after
the the temporary table had been created. In this case the TMP_TABLE_PARAM
object associated with the temporary table created for the recursive
table reference contained unassigned fields needed for execution when
Aria engine is employed as the engine for temporary tables.
This patch ensures that
- select_union object is created only once for any recursive table
reference
- any temporary table created for recursive CTEs uses its own
TMP_TABLE_PARAM structure
The patch also fixes a problem caused by incomplete cleanup of join tables
associated with recursive table references.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.
For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
thd->release_transactional_locks(). The thd function will only call
the mdl_context function if there are no active transactional locks.
In 10.6 we will better fix where we will change the return value for
some trans_xxx() functions to indicate if transaction did close the
transaction or not. This will avoid the need of the indirect call.
Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
are doing additional work (like close_thread_tables) before calling
release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
Do not resend metadata, if metadata does not change between prepare and
execute of prepared statement, or between executes.
Currently, metadata of *every* prepared statement will be checksummed,
and change is detected once checksum changes.
This is not from ideal, performance-wise. The code for
better/faster detection of unchanged metadata, is already in place, but
currently disabled due to PS bugs, such as MDEV-23913.
During graceful shutdowns, client connections are closed and
eventually and THD::awake() acquires LOCK_thd_data mutex which is
required later on in wsrep_thd_is_aborting(). Make sure LOCK_thd_data
is acquired, even if global wsrep_on is disabled.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The issue here was the system variable max_sort_length was being applied
to decimals and it was truncating the value for decimals to the number
of bytes set by max_sort_length.
This was leading to a buffer overflow as the values were written
to the buffer without truncation and then we moved the offset to
the number of bytes(set by max_sort_length), that are needed for comparison.
The fix is to not apply max_sort_length for fixed size types like INT,
DECIMALS and only apply max_sort_length for CHAR, VARCHARS, TEXT and
BLOBS.
MDEV-20945: BACKUP UNLOCK + FTWRL assertion failure | SIGSEGV in I_P_List
from MDL_context::release_lock on INSERT w/ BACKUP LOCK (on optimized
builds) | Assertion `ticket->m_duration == MDL_EXPLICIT' failed
BACKUP LOCK behavior is modified so it won't be used wrong:
- BACKUP LOCK should commit any active transactions.
- BACKUP LOCK should not be allowed in stored procedures.
- When BACKUP LOCK is active, don't allow any DDL's for that connection.
- FTWRL is forbidden on the same connection while BACKUP LOCK is active.
Reviewed-by: monty@mariadb.com
This commit fixed the problems with S3 after the "DROP TABLE FORCE" changes.
It also fixes all failing replication S3 tests.
A slave is delayed if it is trying to execute replicated queries on a
table that is already converted to S3 by the master later in the binlog.
Fixes for replication events on S3 tables for delayed slaves:
- INSERT and INSERT ... SELECT and CREATE TABLE are ignored but written
to the binary log. UPDATE & DELETE will be fixed in a future commit.
Other things:
- On slaves with --s3-slave-ignore-updates set, allow S3 tables to be
opened in read-write mode. This was done to be able to
ignore-but-replicate queries like insert. Without this change any
open of an S3 table failed with 'Table is read only' which is too
early to be able to replicate the original query.
- Errors are now printed if handler::extra() call fails in
wait_while_tables_are_used().
- Error message for row changes are changed from HA_ERR_WRONG_COMMAND
to HA_ERR_TABLE_READONLY.
- Disable some maria_extra() calls for S3 tables. This could cause
S3 tables to fail in some cases.
- Added missing thr_lock_delete() to ma_open() in case of failure.
- Removed from mysql_prepare_insert() the not needed argument 'table'.
- Remove row_start/row_end from keys in fix_create_like();
- Disable manual adding of implicit row_start/row_end to indexes on
CREATE TABLE. INVISIBLE_SYSTEM fields are unoperable by user;
- Fix memory leak on allocation of Key_part_spec.
MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel
replication
Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
before log_and_order.
The original problem for MDEV-21953 was that while a thread was waiting in
for another threads to commit in 'log_and_order', it had the
MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
This causes a deadlock as the waited-for thread can never get past the
MDL_BACKUP_COMMIT lock in ha_commit_trans.
The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
a thread is waiting for other 'previous' threads to commit. This ensures
that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
ensures that there are no deadlocks anymore.
Protocol_local fixed so it can be used now.
Some Protocol:: methods made virtual so they can adapt.
as well as net_ok and net_send_error functions.
execute_sql_string function is exported to the plugins.
To be changed with the mysql_use_result.
and
MDEV-23414 Assertion `res->charset() == item->collation.collation' failed in Type_handler_string_result::make_packed_sort_key_part
pack_sort_string() *must* take a collation from the Item, not from the
String value. Because when casting a string to _binary the original
String is not copied for performance reasons, it's reused but its
collation does not match Item's collation anymore.
Note, that String's collation cannot be simply changed to _binary,
because for an Item_string literal the original String must stay
unchanged for the duration of the query.
this partially reverts 61c15ebe32
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.
An overflow was happening with LONGTEXT columns, when the length was converted to the length
in the strxfrm form (mem-comparable keys).
Introduced a function to truncate the length to the max_sort_length before calculating
the length of the strxfrm form.
- Better to use 'String *' directly.
- Added String::get_value(LEX_STRING*) for the few cases where we want to
convert a String to LEX_CSTRING.
Other things:
- Use StringBuffer for some functions to avoid mallocs
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.
A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
* Allocate items on thd->mem_root while refixing vcol exprs
* Make vcol tree changes register and roll them back after the statement is executed.
Explanation:
Due to collation implementation specifics an Item tree could change while fixing.
The tricky thing here is to make it on a proper arena.
It's usually not a problem when a field is deterministic, however, makes a pain vice-versa, during allocation allocating.
A non-deterministic field should be refixed on each statement, since it depends on the environment state.
Changing the tree will be temporary and therefore it should be reverted after the statement execution.
The COM_MULTI did not take off. No connector is using it.
Remove related code from server, and client.
If anything it is a step simplification of already-bloated
dispatch_command(), and related code.
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.
A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
For low sort_buffer_size, in the cost calculation of using the Unique object the elements in the tree were evaluated to 0, make sure to have atleast 1 element in the Unique tree.
Also for the function Unique::get allocate memory for atleast MERGEBUFF2+1 keys.
cannot use the current THD::mem_root, because it can be temporarily
reassigned to something with a very different life time
(e.g. to TABLE::mem_root or range optimizer mem_root).
MDEV-21398 Deadlock (server hang) or assertion failure in
Diagnostics_area::set_error_status upon ALTER under lock
This failure could only happen if one locked the same table
multiple times and then did an ALTER TABLE on the table.
Major change is to change all instances of
table->m_needs_reopen= true;
to
table->mark_table_for_reopen();
The main fix for the problem was to ensure that we mark all
instances of the table in the locked_table_list and when we
reopen the tables, we first close all tables before reopening
and locking them.
Other things:
- Don't call thd->locked_tables_list.reopen_tables if there
are no tables marked for reopen. (performance)
MDEV-22531 Remove maria::implicit_commit()
MDEV-22607 Assertion `ha_info->ht() != binlog_hton' failed in
MYSQL_BIN_LOG::unlog_xa_prepare
From the handler point of view, Aria now looks like a transactional
engine. One effect of this is that we don't need to call
maria::implicit_commit() anymore.
This change also forces the server to call trans_commit_stmt() after doing
any read or writes to system tables. This work will also make it easier
to later allow users to have system tables in other engines than Aria.
To handle the case that Aria doesn't support rollback, a new
handlerton flag, HTON_NO_ROLLBACK, was added to engines that has
transactions without rollback (for the moment only binlog and Aria).
Other things
- Moved freeing of MARIA_SHARE to a separate function as the MARIA_SHARE
can be still part of a transaction even if the table has closed.
- Changed Aria checkpoint to use the new MARIA_SHARE free function. This
fixes a possible memory leak when using S3 tables
- Changed testing of binlog_hton to instead test for HTON_NO_ROLLBACK
- Removed checking of has_transaction_manager() in handler.cc as we can
assume that as the transaction was started by the engine, it does
support transactions.
- Added new class 'start_new_trans' that can be used to start indepdendent
sub transactions, for example while reading mysql.proc, using help or
status tables etc.
- open_system_tables...() and open_proc_table_for_Read() doesn't anymore
take a Open_tables_backup list. This is now handled by 'start_new_trans'.
- Split thd::has_transactions() to thd::has_transactions() and
thd::has_transactions_and_rollback()
- Added handlerton code to free cached transactions objects.
Needed by InnoDB.
squash! 2ed35999f2a2d84f1c786a21ade5db716b6f1bbc
All changes (except one) is of type
thd->transaction. -> thd->transaction->
thd->transaction points by default to 'thd->default_transaction'
This allows us to 'easily' have multiple active transactions for a
THD object, like when reading data from the mysql.proc table
MDEV-22088 S3 partitioning support
All ALTER PARTITION commands should now work on S3 tables except
REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION
In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.
The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file
Things in more detail
- The .frm and .par files of partitioned tables are stored in S3 and kept
in sync.
- Added hton callback create_partitioning_metadata to inform handler
that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
critical tracing.
* The overlaps check is implemented on a handler level per row command.
It creates a separate cursor (actually, another handler instance) and
caches it inside the original handler, when ha_update_row or
ha_insert_row is issued. Cursor closes on unlocking the handler.
* Containing the same key in index means unique constraint violation
even in usual terms. So we fetch left and right neighbours and check
that they have same key prefix, excluding from the key only the period part.
If it doesnt match, then there's no such neighbour, and the check passes.
Otherwise, we check if this neighbour intersects with the considered key.
* The check does not introduce new error and fails with ER_DUPP_KEY error.
This might break REPLACE workflow and should be fixed separately
MDEV-21605 Clean up and speed up interfaces for binary row logging
MDEV-21617 Bug fix for previous version of this code
The intention is to have as few 'if' as possible in ha_write() and
related functions. This is done by pre-calculating once per statement the
row_logging state for all tables.
Benefits are simpler and faster code both when binary logging is disabled
and when it's enabled.
Changes:
- Added handler->row_logging to make it easy to check it table should be
row logged. This also made it easier to disabling row logging for system,
internal and temporary tables.
- The tables row_logging capabilities are checked once per "statements
that updates tables" in THD::binlog_prepare_for_row_logging() which
is called when needed from THD::decide_logging_format().
- Removed most usage of tmp_disable_binlog(), reenable_binlog() and
temporary saving and setting of thd->variables.option_bits.
- Moved checks that can't change during a statement from
check_table_binlog_row_based() to check_table_binlog_row_based_internal()
- Removed flag row_already_logged (used by sequence engine)
- Moved binlog_log_row() to a handler::
- Moved write_locked_table_maps() to THD::binlog_write_table_maps() as
most other related binlog functions are in THD.
- Removed binlog_write_table_map() and binlog_log_row_internal() as
they are now obsolete as 'has_transactions()' is pre-calculated in
prepare_for_row_logging().
- Remove 'is_transactional' argument from binlog_write_table_map() as this
can now be read from handler.
- Changed order of 'if's in handler::external_lock() and wsrep_mysqld.h
to first evaluate fast and likely cases before more complex ones.
- Added error checking in ha_write_row() and related functions if
binlog_log_row() failed.
- Don't clear check_table_binlog_row_based_result in
clear_cached_table_binlog_row_based_flag() as it's not needed.
- THD::clear_binlog_table_maps() has been replaced with
THD::reset_binlog_for_next_statement()
- Added 'MYSQL_OPEN_IGNORE_LOGGING_FORMAT' flag to open_and_lock_tables()
to avoid calculating of binary log format for internal opens. This flag
is also used to avoid reading statistics tables for internal tables.
- Added OPTION_BINLOG_LOG_OFF as a simple way to turn of binlog temporary
for create (instead of using THD::sql_log_bin_off.
- Removed flag THD::sql_log_bin_off (not needed anymore)
- Speed up THD::decide_logging_format() by remembering if blackhole engine
is used and avoid a loop over all tables if it's not used
(the common case).
- THD::decide_logging_format() is not called anymore if no tables are used
for the statement. This will speed up pure stored procedure code with
about 5%+ according to some simple tests.
- We now get annotated events on slave if a CREATE ... SELECT statement
is transformed on the slave from statement to row logging.
- In the original code, the master could come into a state where row
logging is enforced for all future events if statement could be used.
This is now partly fixed.
Other changes:
- Ensure that all tables used by a statement has query_id set.
- Had to restore the row_logging flag for not used tables in
THD::binlog_write_table_maps (not normal scenario)
- Removed injector::transaction::use_table(server_id_type sid, table tbl)
as it's not used.
- Cleaned up set_slave_thread_options()
- Some more DBUG_ENTER/DBUG_RETURN, code comments and minor indentation
changes.
- Ensure we only call THD::decide_logging_format_low() once in
mysql_insert() (inefficiency).
- Don't annotate INSERT DELAYED
- Removed zeroing pos_in_table_list in THD::open_temporary_table() as it's
already 0
MDEV-21606 Improve update handler (long unique keys on blobs)
MDEV-21470 MyISAM and Aria start_bulk_insert doesn't work with long unique
MDEV-21606 Bug fix for previous version of this code
MDEV-21819 2 Assertion `inited == NONE || update_handler != this'
- Move update_handler from TABLE to handler
- Move out initialization of update handler from ha_write_row() to
prepare_for_insert()
- Fixed that INSERT DELAYED works with update handler
- Give an error if using long unique with an autoincrement column
- Added handler function to check if table has long unique hash indexes
- Disable write cache in MyISAM and Aria when using update_handler as
if cache is used, the row will not be inserted until end of statement
and update_handler would not find conflicting rows.
- Removed not used handler argument from
check_duplicate_long_entries_update()
- Syntax cleanups
- Indentation fixes
- Don't use single character indentifiers for arguments
MDEV-19964 S3 replication support
Added new configure options:
s3_slave_ignore_updates
"If the slave has shares same S3 storage as the master"
s3_replicate_alter_as_create_select
"When converting S3 table to local table, log all rows in binary log"
This allows on to configure slaves to have the S3 storage shared or
independent from the master.
Other thing:
Added new session variable '@@sql_if_exists' to force IF_EXIST to DDL's.
e.g.
- dont -> don't
- occurence -> occurrence
- succesfully -> successfully
- easyly -> easily
Also remove trailing space in selected files.
These changes span:
- server core
- Connect and Innobase storage engine code
- OQgraph, Sphinx and TokuDB storage engines
Related to MDEV-21769.
Some .c and .cc files are compiled as part of Mariabackup.
Enabling -Wconversion for InnoDB would also enable it for
Mariabackup. The .h files are being included during
InnoDB or Mariabackup compilation.
Notably, GCC 5 (but not GCC 4 or 6 or later versions)
would report -Wconversion for x|=y when the type is
unsigned char. So, we will either write x=(uchar)(x|y)
or disable the -Wconversion warning for GCC 5.
bitmap_set_bit(), bitmap_flip_bit(), bitmap_clear_bit(), bitmap_is_set():
Always implement as inline functions.
This task deals with packing the sort key inside the sort buffer, which would
lead to efficient usage of the memory allocated for the sort buffer.
The changes brought by this feature are
1) Sort buffers would have sort keys of variable length
2) The format for sort keys inside the sort buffer would look like
|<sort_length><null_byte><key_part1><null_byte><key_part2>.......|
sort_length is the extra bytes that are required to store the variable
length of a sort key.
3) When packing of sort key is done we store the ORIGINAL VALUES inside
the sort buffer and not the STRXFRM form (mem-comparable sort keys).
4) Special comparison function packed_keys_comparison() is introduced
to compare 2 sort keys.
This patch also contains contributions from Sergei Petrunia.
- Added unlikely() to optimize for not having optimizer trace enabled
- Made THD::trace_started() inline
- Added 'if (trace_enabled())' around some potentially expensive code
(not many found)
- Added ASSERT's to ensure we don't call expensive optimizer trace calls
if optimizer trace is not enabled
- Added length to Json_writer functions to speed up buffer writes
when optimizer trace is enabled.
- Changed LEX_CSTRING argument handling to not send full struct to writer
function on_add_str() functions now trusts length arguments
This patch adds support of RENAME INDEX operation to the ALTER TABLE
statement. Code which determines if ALTER TABLE can be done in-place
for "simple" storage engines like MyISAM, Heap and etc. was updated to
handle ALTER TABLE ... RENAME INDEX as an in-place operation. Support
for in-place ALTER TABLE ... RENAME INDEX for InnoDB was covered by
MDEV-13301.
Syntax changes
==============
A new type of <alter_specification> is added:
<rename index clause> ::= RENAME ( INDEX | KEY ) <oldname> TO <newname>
Where <oldname> and <newname> are identifiers for old name and new
name of the index.
Semantic changes
================
The result of "ALTER TABLE t1 RENAME INDEX a TO b" is a table which
contents and structure are identical to the old version of 't1' with
the only exception index 'a' being called 'b'.
Neither <oldname> nor <newname> can be "primary". The index being
renamed should exist and its new name should not be occupied
by another index on the same table.
Related to: WL#6555, MDEV-13301
The existing syntax for renaming a column uses "ALTER TABLE ... CHANGE"
command. This requires full column specification to rename the column.
This patch adds new syntax "ALTER TABLE ... RENAME COLUMN", which do not
expect users to provide full column specification. It means that the new
syntax would pick in-place or copy algorithm in the same way as that of
existing "ALTER TABLE ... CHANGE" command. The existing syntax
"ALTER TABLE ... CHANGE" will continue to work.
Syntax changes
==============
ALTER TABLE tbl_name
[alter_specification [, alter_specification] ...]
[partition_options]
Following is a new <alter_specification> added:
| RENAME COLUMN <oldname> TO <newname>
Where <oldname> and <newname> are identifiers for old name and new
name of the column.
Related to: WL#10761
Support for galera GTID consistency thru cluster. All nodes in cluster
should have same GTID for replicated events which are originating from cluster.
Cluster originating commands need to contain sequential WSREP GTID seqno
Ignore manual setting of gtid_seq_no=X.
In master-slave scenario where master is non galera node replicated GTID is
replicated and is preserved in all nodes.
To have this - domain_id, server_id and seqnos should be same on all nodes.
Node which bootstraps the cluster, to achieve this, sends domain_id and
server_id to other nodes and this combination is used to write GTID for events
that are replicated inside cluster.
Cluster nodes that are executing non replicated events are going to have different
GTID than replicated ones, difference will be visible in domain part of gtid.
With wsrep_gtid_domain_id you can set domain_id for WSREP cluster.
Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and
WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format.
Fixed galera tests to reflect this chances.
Add variable to manually update WSREP GTID seqno in cluster
Add variable to manipulate and change WSREP GTID seqno. Next command
originating from cluster and on same thread will have set seqno and
cluster should change their internal counter to it's value.
Behavior is same as using @@gtid_seq_no for non WSREP transaction.
* size represents the size of an element in the Unique class
* full_size is used when the Unique class counts the number of
duplicates stored per element. This requires additional space per Unique
element.
Currently InnoDB uses internal parser for adding foreign keys. Remove
internal parser and use data parsed by SQL parser (sql_yacc) for
adding foreign keys.
- create_table_info_t::create_foreign_keys() replacement for
dict_create_foreign_constraints_low();
- Pass constraint name via Foreign_key object.
Temporary until MDEV-20865:
- Pass alter_info as part of create_info.
revision-id: 673e253724979fd9fe43a4a22bd7e1b2c3a5269e
Author: Kristian Nielsen
Fix missing memory barrier in wait_for_commit.
The function wait_for_commit::wait_for_prior_commit() has a fast path where it
checks without locks if wakeup_subsequent_commits() has already been called.
This check was missing a memory barrier. The waitee thread does two writes to
variables `waitee' and `wakeup_error', and if the waiting thread sees the
first write it _must_ also see the second or incorrect behavior will occur.
This requires memory barriers between both the writes (release semantics) and
the reads (acquire semantics) of those two variables.
Other accesses to these variables are done under lock or where only one thread
will be accessing them, and can be done without barriers (relaxed semantics).
Count the "gap" time between table accesses and display it as
r_other_time_ms in the "table" element.
* The advantage of this approach is that it doesn't add any new
my_timer_cycles() calls.
* The disadvantage is that the definition of what is done during
"other time" is not that clear: it includes checking the WHERE
(for this table), constructing index lookup tuple (for the next table)
writing to GROUP BY temporary table (as we dont account for that time
separately [yet], etc)
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
- Use local variables table and share to simplify code
- Use sql_command_flags to detect what kind of command was used
- Added CF_DELETES_DATA to simplify detecton of delete commands
- Removed duplicate error in create_table_from_items().
Now both offset and limit are stored and do not chenged during execution
(offset is decreased during processing in versions before 10.5).
(Big part of this changes made by Monty)
read_statistics_for_tables_if_needed
Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.
Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.
Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
Always initialize ScopedStatementReplication::saved_binlog_format,
so that GCC cannot emit a bogus warning about
ScopedStatementReplication::~ScopedStatementReplication() using the
variable.
The code was originally introduced in
commit d998da0306.
- call current_schema::mark_as_changed() directly
- call state_change::mark_as_changed() directly
- replaced SESSION_TRACKER_CHANGED with dummy tracker
- replaced Session_tracker::mark_as_changed() with
State_tracker::mark_as_changed()
- hide and devirtualize original State_tracker::mark_as_changed(),
rename it to set_changed()
- all implementations of mark_as_changed() now check is_enabled() for
consistency
- no argument casts anymore
(Backported to 10.3, addressed review input)
Sj_materialization_picker::check_qep(): fix error in cost/fanout
calculations:
- for each join prefix, add #prefix_rows / TIME_FOR_COMPARE to the cost,
like best_extension_by_limited_search does
- Remove the fanout produced by the subquery tables.
- Also take into account join condition selectivity
optimize_wo_join_buffering() (used by LooseScan and FirstMatch)
- also add #prefix_rows / TIME_FOR_COMPARE to the cost of each prefix.
- Also take into account join condition selectivity
Problem:-
When mysql executes INSERT ON DUPLICATE KEY INSERT, the storage engine checks
if the inserted row would generate a duplicate key error. If yes, it returns
the existing row to mysql, mysql updates it and sends it back to the storage
engine.When the table has more than one unique or primary key, this statement
is sensitive to the order in which the storage engines checks the keys.
Depending on this order, the storage engine may determine different rows
to mysql, and hence mysql can update different rows.The order that the
storage engine checks keys is not deterministic. For example, InnoDB checks
keys in an order that depends on the order in which indexes were added to
the table. The first added index is checked first. So if master and slave
have added indexes in different orders, then slave may go out of sync.
Solution:-
Make INSERT...ON DUPLICATE KEY UPDATE unsafe while using stmt or mixed format
When there is more then one unique key.
Although there is two exception.
1. Auto Increment key is not counted because Innodb will get gap lock for
failed Insert and concurrent insert will get a next increment value. But if
user supplies auto inc value it can be unsafe.
2. Count only unique keys for which insertion is performed.
So this patch also addresses the bug id #72921
Sources did not compile in some builds because of undeclared
ER_BLOB_KEY_WITHOUT_LENGTH. Moving the implementations of
Key_part_spec::check_key_length_for_blob() from sql_class.h to sql_class.cc
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
Plugin fixed to not lock the LOCK_operations when not active.
Server fixed to lock the LOCK_plugin less - do it once per
thread and then only if a plugin was installed/uninstalled.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.
Originally introduced by e972125f1 to avoid harmless wait for
LOCK_global_system_variables in a newly created thread, which creation was
initiated by system variable update.
At the same time it opens dangerous hole, when system variable update
thread already released LOCK_global_system_variables and ack_receiver
thread haven't yet completed new THD construction. In this case THD
constructor goes completely unprotected.
Since ack_receiver.stop() waits for the thread to go down, we have to
temporarily release LOCK_global_system_variables so that it doesn't
deadlock with ack_receiver.run(). Unfortunately it breaks atomicity
of rpl_semi_sync_master_enabled updates and makes them not serialized.
LOCK_rpl_semi_sync_master_enabled was introduced to workaround the above.
TODO: move ack_receiver start/stop into repl_semisync_master
enable_master/disable_master under LOCK_binlog protection?
Part of MDEV-14984 - regression in connect performance
XID_STATE::rm_error is never used by internal 2PC, it is intended to be
used by explicit XA only.
Also removed redundant xid reset from THD::init_for_queries(). Must've
been done already either by THD::transaction constructor or by
THD::cleanup().
Part of MDEV-7974 - backport fix for mysql bug#12161 (XA and binlog)
The command SHOW INDEXES ignored setting of the system variable
use_stat_tables to the value of 'preferably' and and showed statistical
data received from the engine. Similarly queries over the table
STATISTICS from INFORMATION_SCHEMA ignored this setting. It happened
because the function fill_schema_table_by_open() did not read any data
from statistical tables.
This bug is caused by pushdown from HAVING into WHERE.
It appears because condition that is pushed wasn't fixed.
It is also discovered that condition pushdown from HAVING into
WHERE is done wrong. There is no need to build clones for some
conditions that can be pushed. They can be simply moved from HAVING
into WHERE without cloning.
build_pushable_cond_for_having_pushdown(),
remove_pushed_top_conjuncts_for_having() methods are changed.
It is found that there is no transformation made for fields of
pushed condition.
field_transformer_for_having_pushdown transformer is added.
New tests are added. Some comments are changed.
Do not register intermediate tables created by inplace ALTER TABLE in
THD::temporary_tables.
Regular ALTER TABLE doesn't create .frm for temporary and discoverable
tables anymore. For inplace ALTER TABLE moved .frm creation to
create_table_for_inplace_alter().
Removed open_in_engine argument of create_and_open_tmp_table() and
open_temporary_table(): it became unused after this patch.
Part of MDEV-17805 - Remove InnoDB cache for temporary tables.
CREATE TEMPORARY TABLE locks SE plugin 6 times. 5 of these locks are
released by the end of the statement. And only 1 acquired by
init_from_binary_frm_image() / plugin_lock() remains.
The lock removed in this patch was clearly redundant.
Part of MDEV-17805 - Remove InnoDB cache for temporary tables.
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
* MDEV-16509 Improve wsrep commit performance with binlog disabled
Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.
Implemented a test which verifies that the transactions release
commit order early.
This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.
* MDEV-18730 Ordering for wsrep binlog group commit
Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().
* MDEV-18730 Added a test case to verify correct commit ordering
* MDEV-16509, MDEV-18730 Review fixes
Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.
* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test
If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.
Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
The patches features an optional shutdown behavior to hold on until
after all connected slaves have been sent the last binlogged event.
The connected slave is one whose START SLAVE has been acknowledged and
that was not stopped since that though it could be technically
reconnecting in background.
The solution therefore disallows killing the dump thread until is has
found EOF of the latest binlog file. It is up to the shutdown
requester (DBA) to set up a sufficiently large shutdown timeout value
for shudown to wait patiently until lagging behind slaves have been
synchronized. On the other hand if a specific slave needs exclusion
from synchronization the DBA would have to stop it manually which
would terminate its dump thread.
`mysqladmin shutdown' is extended with a `--wait_for_all_slaves' option
which translates to `SHUTDOW WAIT FOR ALL SLAVES' sql query
to enable the feature on the client side.
The patch also performs a small refactoring of the server shutdown
around close_connections() to introduce kill thread phases which
are two as of current.
Make mysqltest to use --ps-protocol more
use prepared statements for everything that server supports
with the exception of CALL (for now).
Fix discovered test failures and bugs.
tests:
* PROCESSLIST shows Execute state, not Query
* SHOW STATUS increments status variables more than in text protocol
* multi-statements should be avoided (see tests with a wrong delimiter)
* performance_schema events have different names in --ps-protocol
* --enable_prepare_warnings
mysqltest.cc:
* make sure run_query_stmt() doesn't crash if there's
no active connection (in wait_until_connected_again.inc)
* prepare all statements that server supports
protocol.h
* Protocol_discard::send_result_set_metadata() should not send
anything to the client.
sql_acl.cc:
* extract the functionality of getting the user for SHOW GRANTS
from check_show_access(), so that mysql_test_show_grants() could
generate the correct column names in the prepare step
sql_class.cc:
* result->prepare() can fail, don't ignore its return value
* use correct number of decimals for EXPLAIN columns
sql_parse.cc:
* discard profiling for SHOW PROFILE. In text protocol it's done in
prepare_schema_table(), but in --ps it is called on prepare only,
so nothing was discarding profiling during execute.
* move the permission checking code for SHOW CREATE VIEW to
mysqld_show_create_get_fields(), so that it would be called during
prepare step too.
* only set sel_result when it was created here and needs to be
destroyed in the same block. Avoid destroying lex->result.
* use the correct number of tables in check_show_access(). Saying
"as many as possible" doesn't work when first_not_own_table isn't
set yet.
sql_prepare.cc:
* use correct user name for SHOW GRANTS columns
* don't ignore verbose flag for SHOW SLAVE STATUS
* support preparing REVOKE ALL and ROLLBACK TO SAVEPOINT
* don't ignore errors from thd->prepare_explain_fields()
* use select_send result for sending ANALYZE and EXPLAIN, but don't
overwrite lex->result, because it might be needed to issue execute-time
errors (select_dumpvar - too many rows)
sql_show.cc:
* check grants for SHOW CREATE VIEW here, not in mysql_execute_command
sql_view.cc:
* use the correct function to check privileges. Old code was doing
check_access() for thd->security_ctx, which is invoker's sctx,
not definer's sctx. Hide various view related errors from the invoker.
sql_yacc.yy:
* initialize lex->select_lex for LOAD, otherwise it'll contain garbage
data that happen to fail tests with views in --ps (but not otherwise).
This solves the following issues:
* unlike lex->m_sql_cmd and lex->sql_command, thd->query_plan_flags
is not reset in Prepared_statement::execute, it survives
till the log_slow_statement(), so slow logging behaves correctly in --ps
* using thd->query_plan_flags for both slow_log_filter and
log_slow_admin_statements means the definition of "admin" statements
for the slow log is the same no matter how it is filtered out.
slave_list was used to provide data for SHOW SLAVE HOSTS and
Slaves_connected status variable.
Introduced binlog_dump_thread_count which is exposed via Slaves_connected
(replaces slave_list.records).
Store Slave_info on THD and access it by iterating server_threads
(replaces slave_list).
Added:
THD::slave_info
binlog_dump_thread_count
show_slave_hosts_callback()
Removed:
slave_list
SLAVE_LIST_CHUNK
SLAVE_ERRMSG_SIZE
slave_list_key()
slave_info_free()
init_slave_list()
end_slave_list()
all_slave_list_mutexes
init_all_slave_list_mutexes()
key_LOCK_slave_list
LOCK_slave_list
Moved:
SLAVE_INFO -> Slave_info
register_slave() -> THD::register_slave()
unregister_slave() -> THD::unregister_slave()
Also removed redundant end_slave() from close_connections(): it is called
again soon afterwards by clean_up().
Pre-requisite for clean MDEV-18450 solution.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
This patch adds support for expiring user passwords.
The following statements are extended:
CREATE USER user@localhost PASSWORD EXPIRE [option]
ALTER USER user@localhost PASSWORD EXPIRE [option]
If no option is specified, the password is expired with immediate
effect. If option is DEFAULT, global policy applies according to
the default_password_lifetime system var (if 0, password never
expires, if N, password expires every N days). If option is NEVER,
the password never expires and if option is INTERVAL N DAY, the
password expires every N days.
The feature also supports the disconnect_on_expired_password system
var and the --connect-expired-password client option.
Closes#1166
Apparently DBUG_ASSERT() can co-exist with DBUG_OFF when
-DCMAKE_CXX_FLAGS="-DDBUG_ASSERT_AS_PRINTF".
Removed assertion as it is useless now, since the type is unsigned.
The variable controls the amount of sampling analyze table performs.
If ANALYZE table with histogram collection is too slow, one can reduce the
time taken by setting analyze_sample_percentage to a lower value of the
total number of rows.
Setting it to 0 will use a formula to compute how many rows to sample:
The number of rows collected is capped to a minimum of 50000 and
increases logarithmically with a coffecient of 4096. The coffecient is
chosen so that we expect an error of less than 3% in our estimations
according to the paper:
"Random Sampling for Histogram Construction: How much is enough?”
– Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.
The drawback of sampling is that avg_frequency number is computed
imprecisely and will yeild a smaller number than the real one.
Condition can be pushed from the HAVING clause into the WHERE clause
if it depends only on the fields that are used in the GROUP BY list
or depends on the fields that are equal to grouping fields.
Aggregate functions can't be pushed down.
How the pushdown is performed on the example:
SELECT t1.a,MAX(t1.b)
FROM t1
GROUP BY t1.a
HAVING (t1.a>2) AND (MAX(c)>12);
=>
SELECT t1.a,MAX(t1.b)
FROM t1
WHERE (t1.a>2)
GROUP BY t1.a
HAVING (MAX(c)>12);
The implementation scheme:
1. Extract the most restrictive condition cond from the HAVING clause of
the select that depends only on the fields that are used in the GROUP BY
list of the select (directly or indirectly through equalities)
2. Save cond as a condition that can be pushed into the WHERE clause
of the select
3. Remove cond from the HAVING clause if it is possible
The optimization is implemented in the function
st_select_lex::pushdown_from_having_into_where().
New test file having_cond_pushdown.test is created.
This task involves the implementation for the optimizer trace.
This feature produces a trace for any SELECT/UPDATE/DELETE/,
which contains information about decisions taken by the optimizer during
the optimization phase (choice of table access method, various costs,
transformations, etc). This feature would help to tell why some decisions were
taken by the optimizer and why some were rejected.
Trace is session-local, controlled by the @@optimizer_trace variable.
To enable optimizer trace we need to write:
set @@optimizer_trace variable= 'enabled=on';
To display the trace one can run:
SELECT trace FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
This task also involves:
MDEV-18489: Limit the memory used by the optimizer trace
introduces a switch optimizer_trace_max_mem_size which limits
the memory used by the optimizer trace. This was implemented by
Sergei Petrunia.
Signal handler is now respoinsible for setting abort_loop and breaking
poll() in main thread. The rest is handled by main thread itself.
Removed redundant LOCK_error_log init/destroy wrappers.
Removed redundant unireg_end(): it is trivial and it has only one caller.
Removed unused ready_to_exit from PFS.
Removed kill_in_progress: duplicates abort_loop.
Removed shutdown_in_progress: duplicates abort_loop.
Removed ready_to_exit: was used to make sure main thread waits for
cleanups, which are now done by main thread itself.
Removed SIGNALS_DONT_BREAK_READ, MAYBE_BROKEN_SYSCALL,
kill_broken_server: never defined/used.
Make clean_up() static.
modifications (insert/erase) are protected by write lock
iteration over list is protected by read lock.
This way, threads that iterate over the list (as in SHOW PROCESSLIST,
SHOW GLOBAL STATUS) do not block each other.
In contrast to thread_count, which is decremented by THD destructor,
this one was most probably intended to be decremented after all THD
destructors are done.
THD_count class was added to achieve similar effect with thread_count.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
Implemented and integrated THD_list as a replacement for the global
thread list. It uses own mutex instead of LOCK_thread_count for THD
list protection.
Removed unused first_global_thread() and next_global_thread().
delayed_insert_threads is now protected by LOCK_delayed_insert. Although
this patch doesn't fix very wrong synchronization of this variable.
After this patch there are only 2 legitimate uses of LOCK_thread_count
left, both in mysqld.cc: thread_count and ready_to_exit.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
LOG_INFO::lock was useless. It could've only protect against concurrent
iterators execution, which was already protected by LOCK_thread_count.
Use LOCK_thd_data instead of LOCK_thread_count as a protection against
THD::current_linfo reset.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
Part of MDEV-5336 Implement LOCK FOR BACKUP
- Changed check of Global_only_lock to also include BACKUP lock.
- We store latest MDL_BACKUP_DDL lock in thd->mdl_backup_ticket to be able
to downgrade lock during copy_data_between_tables()
Changing the way how a cursor is opened to fetch its structure only,
e.g. for a cursor FOR loop record variable.
The old methods with setting thd->lex->limit_rows_examined to an Item_uint(0)
was not reliable and could push these messages into diagnostics area:
The query examined at least 1 rows, which exceeds LIMIT ROWS EXAMINED (0)
The new method should be more reliable, as it completely prevents the call
of do_select() in JOIN::exec_inner() during the cursor structure discovery,
so the execution of the cursor SELECT query returns immediately after the
preparation step (when the result row structure becomes known),
without even entering the code that fetches the result rows.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.