MDEV-21398 Deadlock (server hang) or assertion failure in
Diagnostics_area::set_error_status upon ALTER under lock
This failure could only happen if one locked the same table
multiple times and then did an ALTER TABLE on the table.
Major change is to change all instances of
table->m_needs_reopen= true;
to
table->mark_table_for_reopen();
The main fix for the problem was to ensure that we mark all
instances of the table in the locked_table_list and when we
reopen the tables, we first close all tables before reopening
and locking them.
Other things:
- Don't call thd->locked_tables_list.reopen_tables if there
are no tables marked for reopen. (performance)
MDEV-22488 test failures: parts.partition_debug_innodb /
parts.partition_debug_myisam
The reason for the failure was a wrong printf() that accessed not existing
things on the stack.
The reason the falure was hard to find was that the partition_debug_...
tests disables core dumps, so there was no trace that the server had
crashed in the logs.
Fixed by fixing the faulty push_warning_printf() and splitting the tests
into two parts, one that test failures (with core dumps enabled) and one
part that test crash recovery.
The review and test splitting was done by Monty
- ALTER_ALGORITHM should be substituted when there is no mention of
algorithm in alter statement.
- Introduced algorithm(thd) in Alter_info. It returns the
user requested algorithm. If user doesn't specify algorithm explicitly then
it returns alter_algorithm variable.
- changed algorithm() to get_algorithm(thd) to return algorithm name for
displaying the error.
- set_requested_algorithm(algo_value) to avoid direct assignment on
requested_algorithm variable.
- Avoid direct access of requested_algorithm to encapsulate
requested_algorithm variable
This is continuation of MDEV-22153 bug when contiguity of history
partitions is broken. ha_partition::open_read_partitions() can not
handle non-contiguous list of default partitions.
Fix: when default partition is dropped convert list of partitions to
non-default.
MDEV-22088 S3 partitioning support
All ALTER PARTITION commands should now work on S3 tables except
REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION
In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.
The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file
Things in more detail
- The .frm and .par files of partitioned tables are stored in S3 and kept
in sync.
- Added hton callback create_partitioning_metadata to inform handler
that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
critical tracing.
>= M_TOT_PARTS' FAILED.
This patch is taken from MySQL, originally written by Mattias Jonsson
Here follows the original commit message:
Problem in handle_alter_part_error(),
result in altered partition_info object was still used
if table was under LOCK TABLES.
Solution was to always close and destroy all table
and table_share instances if exclusive mdl lock was
possible.
If not succeeding in get an exlusive lock (only possible
during rollback of DDL), at least close and destroy this
table instance.
rb#7361.
Approved by Mikael and Aditya.
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
- Fixed mysql_prepare_create_table() constraint duplicate checking;
- Refactored period constraint handling in mysql_prepare_alter_table():
* No need to allocate new objects;
* Keep old constraint name but exclude it from dup checking by automatic_name;
- Some minor memory leaks fixed;
- Some conceptual TODOs.
close_all_tables_for_name() is always preceded by
wait_while_table_is_used(), which makes tdc_remove_table() redundant.
The only (now fixed) exception was close_cached_tables().
Part of MDEV-17882 - Cleanup refresh version
Assertion `old_part_id == m_last_part' failed in ha_partition::update_row or `part_id == m_last_part' in ha_partition::delete_row upon UPDATE/DELETE after dropping versioning
PRIMARY KEY change hadn't been treated as partition reorganization in case of partitioning by KEY() (without parameters).
* set `*partition_changed= true` in the described case.
* since add/drop system versioning does not affect alter_info->key_list, it required separate attention
When there are E empty partitions left, auto-create N new empty
partitions for SYSTEM_TIME partitioning rotated by INTERVAL/LIMIT and
marked by AUTO_INCREMENT keyword. Syntax change: AUTO_INCREMENT
keyword (or shorter AUTO may be used instead) after LIMIT/INTERVAL
clause.
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 100000 AUTO_INCREMENT;
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK AUTO_INCREMENT;
The current revision implements hard-coded values of 1 for E and N. As
well as auto-creation threshold MinInterval = 1 hour, MinLimit = 1000.
The name for newly added partition will be first chosen as "pX", where
X is partition number and "p" is hard-coded name prefix. If this name
is already occupied, the X will be incremented until the resulting
name will be free to use.
ALTER TABLE ADD PARTITION is now always fast. If there some history
partition overflow occurs manual ALTER TABLE REBUILD PARTITION is
needed.
* Explicit STARTS syntax
* SHOW CREATE
* Default STARTS rounding depending on INTERVAL type
* Warn when STARTS timestamp is later than query time
* Fix uninitialized Lex->create_last_non_select_table under
mysql_unpack_partition()
Default STARTS rounding depending on INTERVAL type
If STARTS clause is omitted, default one is assigned with value
derived from query timestamp. The rounding is done on STARTS value
depending on INTERVAL type:
SECOND: no rounding is done;
MINUTE: timestamp seconds is set to 0;
HOUR: timestamp seconds and minutes are set to 0;
DAY, WEEK, MONTH and YEAR: timestamp seconds, minutes and hours are
set to 0 (the date of rotation is kept as current date).
The code in convert_charset_partition_constant() did not
take into account that the call for item->safe_charset_converter()
can return NULL when conversion is not safe.
Note, 10.2 was not affected. The test for NULL presents in 10.2,
but it disappeared in 10.3 in a mistake. Restoring the test.
The old code to print partition values was too complicated:
- it created new Items for character set conversion purposes.
- it mixed string conversion and partition error reporting
in the same code blocks.
Simplifying the code as follows:
- Adding helper methods String::can_be_safely_convert_to() and
String::append_introducer_and_hex().
- Adding DBUG_EXECUTE_IF("generate_partition_syntax_for_frm", push_warning...)
into generate_partition_syntax_for_frm(), to test the PARTITON
clause written to FRM. Adding test partition_utf8-debug.test for this.
- Removing functions get_cs_converted_part_value_from_string() and
get_cs_converted_string_value. Changing get_partition_column_description()
to use Type_handler::partition_field_append_value() instead.
This makes SHOW CREATE TABLE and SELECT FROM I_S.PARTITIONS
use the same code path.
- Changing Type_handler::partition_field_append_value() not to
call convert_charset_partition_constant(), to avoid creating a new Item
for string conversion pursposes.
Rewritting the code to use only String methods.
- Removing error reporting code (ER_PARTITION_FUNCTION_IS_NOT_ALLOWED)
from Type_handler::partition_field_append_value().
The error is correctly detected and reported on the caller level.
So error reporting was redundant here.
Also:
- Moving methods Type_handler::partition_field_*() from sql_partition.cc
to sql_type.cc. This fixes compilation problem with -DPLUGIN_PARTITION=NO,
earlier introduced by the patch for MDEV-20831.
This clause in CREATE TABLE:
PARTITION BY LIST COLUMNS (inet6column)
(PARTITION p1 VALUES IN ('::'))
was erroneously written to frm file as:
PARTITION BY LIST COLUMNS(inet6column)
(PARTITION p1 VALUES IN (_binary 0x3A3A))
I.e. the text value '::' was converted to HEX representation
and prefixed with _binary.
A simple fix could write `_latin1 0x3A3A` instead of `_binary 0x3A3A`,
but in case of INET6 we don't need neither character set introducers,
nor HEX encoding, because text representation of INET6 values consist
of pure ASCII characters.
So this patch changes the above clause to be printed as:
PARTITION BY LIST COLUMNS(inet6column)
(PARTITION p1 VALUES IN ('::'))
Details:
The old code in check_part_field() was not friendly to pluggable data types.
Replacing this function to two new virtual methods in Type_handler:
virtual bool partition_field_check(const LEX_CSTRING &field_name,
Item *item_expr) const;
virtual bool partition_field_append_value(String *str,
Item *item_expr,
CHARSET_INFO *field_cs,
partition_value_print_mode_t mode)
const;
so data type plugins can decide whether they need to use character set
introducer and/or hex encoding when printing partition values.
InnoDB intentionally (it's a documented behavior) ignores changing of
DATA DIRECTORY and INDEX DIRECTORY for partitions. Though we should
issue warning when this happens.
Pruning fix for SYSTEM_TIME INTERVAL partitioning.
Allocating one more element in range_int_array for CURRENT partition
is required for RANGE pruning to work correctly
(get_partition_id_range_for_endpoint()).
SYSTEM_TYPE partitioning: COLUMN properties removed. Partitioning is
now pure RANGE based on UNIX_TIMESTAMP(row_end).
DECIMAL type is now allowed as RANGE partitioning, we can partition by
UNIX_TIMESTAMP() (but not for DATETIME which depends on local timezone
of course).
Fix partitioning for trx_id-versioned tables.
`partition by hash`, `range` and others now work.
`partition by system_time` is forbidden.
Currently we cannot use row_start and row_end in `partition by`, because
insertion of versioned field is done by engine's handler, as well as
row_start/row_end's value set up, which is a transaction id -- so it's
also forbidden.
The drawback is that it's now impossible to use `partition by key()`
without parameters for such tables, because it references row_start and
row_end implicitly.
* add handler::vers_can_native()
* drop Table_scope_and_contents_source_st::vers_native()
* drop partition_element::find_engine_flag as unused
* forbid versioning partitioning for trx_id as not supported
* adopt vers tests for trx_id partitioning
* forbid any row_end referencing in `partition by` clauses,
including implicit `by key()`
The problem described in the bug report happened because the code
did not test check_cols(1) after fix_fields() in a few places.
Additionally, fix_fields() could be called multiple times for SP variables,
because they are all fixed at a early stage in append_for_log().
Solution:
1. Adding a few helper methods
- fix_fields_if_needed()
- fix_fields_if_needed_for_scalar()
- fix_fields_if_needed_for_bool()
- fix_fields_if_needed_for_order_by()
and using it in many cases instead of fix_fields() where
the "fixed" status is not definitely known to be "false".
2. Adding DBUG_ASSERT(!fixed) into Item_splocal*::fix_fields()
to catch double execution.
3. Adding tests.
As a good side effect, the patch removes a lot of duplicate code (~60 lines):
if (!item->fixed &&
item->fix_fields(..) &&
item->check_cols(1))
return true;
rename to post_fix_fields_part_expr_processor()
because it's only used after fix_fields in
fix_fields_part_func() and can be used for
various post-fix_fields fixups
Don't use hidden system time in versioning,
but keep the system time logic in THD
to workaround low-res system clock and
replication not versioned to versioned.
This reverts MDEV-14788 (System versioning cannot
be based on local timestamps, as it is now).
Versioning is based on local timestamps again,
but timestamps are protected by MDEV-15923
(option to control who can set session @@timestamp).
table.cc:
virtual columns must be computed for INSERT, if they're part
of the partitioning expression.
this change broke gcol.gcol_partition_innodb.
fix CHECK TABLE for partitioned tables and vcols.
sql_partition.cc:
mark prerequisite base columns in full_part_field_set
ha_partition.cc
initialize vcol_set accordingly
As thd->alloc() and new automatically calls my_error(ER_OUTOFMEORY)
there is no reason to call mem_alloc_error()
Other things:
- Fixed bug in mysql_unpack_partition() where lex.part_info was
changed even if it would be a null pointer
ALTER TABLE ... ADD PARTITION modifies the open TABLE structure,
and sets table->need_reopen=1 to reset these modifications
in case of an error.
But under LOCK TABLES the table isn't get reopened, despite need_reopen.
Fixed by reopening need_reopen tables under LOCK TABLE.
after rebuilding under test_pseudo_invisible
If we are doing alter related to partitioning then simple alter stmt
like adding column(or any alter stmt) can't be combined with partition
alter, this will generate a syntax error.
But IF we add
SET debug_dbug="+d,test_pseudo_invisible";
or test_completely_invisible
this will add a column to table with have an already partitioning related
alter. This execution of wrong stmt will crash the server on later stages.
(like on repair partition).
So we will simply return 1 (and ER_INTERNAL_ERROR) if we any of these
debug_dbug flags turned on.
This is done to get more free flag bits for alter_info->flags
Renamed all ALTER PARTITION defines to start with ALTER_PARTITION_
Renamed ALTER_PARTITION to ALTER_PARTITION_INFO
Renamed ALTER_TABLE_REORG to ALTER_PARTITION_TABLE_REORG
Other things:
- Shifted some ALTER_xxx defines to get empty bits at end
Main reason was to make it easier to print the above structures in
a debugger. Additional benefits is that I was able to use same
defines for both structures, which simplifes some code.
Most of the code is just removing Alter_info:: and Alter_inplace_info::
from alter table flags.
Following renames was done:
HA_ALTER_FLAGS -> alter_table_operations
CHANGE_CREATE_OPTION -> ALTER_CHANGE_CREATE_OPTION
Alter_info::ADD_INDEX -> ALTER_ADD_INDEX
DROP_INDEX -> ALTER_DROP_INDEX
ADD_UNIQUE_INDEX -> ALTER_ADD_UNIQUE_INDEX
DROP_UNIQUE_INDEx -> ALTER_DROP_UNIQUE_INDEX
ADD_PK_INDEX -> ALTER_ADD_PK_INDEX
DROP_PK_INDEX -> ALTER_DROP_PK_INDEX
Alter_info:ALTER_ADD_COLUMN -> ALTER_PARSE_ADD_COLUMN
Alter_info:ALTER_DROP_COLUMN -> ALTER_PARSE_DROP_COLUMN
Alter_inplace_info::ADD_INDEX -> ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX
Alter_inplace_info::DROP_INDEX -> ALTER_DROP_NON_UNIQUE_NON_PRIM_INDEX
Other things:
- Added typedef alter_table_operatons for alter table flags
- DROP CHECK CONSTRAINT can now be done online
- Added checks for Aria tables in alter_table_online.test
- alter_table_flags now takes an ulonglong as argument.
- Don't support online operations if checksum option is used.
- sql_lex.cc doesn't add ALTER_ADD_INDEX if index is not created
PREBUILT->TABLE->N_MYSQL_HANDLES_OPENED == 1
ANALYSIS:
=========
Adding unique index to a InnoDB table which is locked as
mutliple instances may trigger an InnoDB assert.
When we add a primary key or an unique index, we need to
drop the original table and rebuild all indexes. InnoDB
expects that only the instance of the table that is being
rebuilt, is open during the process. In the current
scenario we have opened multiple instances of the table.
This triggers an assert during table rebuild.
'Locked_tables_list' encapsulates a list of all
instances of tables locked by LOCK TABLES statement.
FIX:
===
We are now temporarily closing all the instances of the
table except the one which is being altered and later
reopen them via Locked_tables_list::reopen_tables().
Because NOW() is set to system time, unless overriden.
And both should follow big manual system time changes,
while still coping with lowres system clocks.
Ignoring system time changes is both confusing and
breaks with restarts.
Lots of changes:
* calculate the current history partition in ::external_lock(),
not in ::write_row() or ::update_row()
* remove dynamically collected per-partition row_end stats
* no full table scan in open_table_from_share to calculate these
stats, no manual MDL/thr_locks in open_table_from_share
* no shared stats in TABLE_SHARE = no mutexes or condition waits when
calculating current history partition
* always compare timestamps, don't convert them to MYSQL_TIME
(avoid DST ambiguity, and it's faster too)
* correct interval handling, 1 month = 1 month, not 30 * 24 * 3600 seconds
* save/restore first partition start time, and count intervals from there
* only allow to drop first partitions if INTERVAL
* when adding new history partitions, split the data in the last history
parition, if it was overflowed
* show partition boundaries in INFORMATION_SCHEMA.PARTITIONS
partition_info had a bunch of function pointers to avoid if()'s
when invoking part_type specific functionality (like get_part_id, etc).
But check_range_constants() and check_list_constants() were still
invoked conditionally, with if()'s.
Create partition_info::check_constants function pointer, get rid
of if()'s
Also remove alloc argument of check_range_constants(), added
in 26a3ff0a22. Broken system versioning will be fixed in
following commits.
enum_mark_columns -> enum_column_usage
mark_used_columns -> column_usage
further commits will replace MARK_COLUMN_NONE with
COLUMN_READ and COLUMN_WRITE that convey the intention
without causing columns to be marked