I_S tables were materialized too late, an attempt to use table
statistics before the table was created caused a crash.
Let's move table creation up. it only needs read_set to
be calculated properly, this happens in JOIN::optimize_inner(),
after semijoin transformation.
Note that tables are not populated at that point, so most of the
statistics would make no sense anyway. But at least field sizes
will be correct. And it won't crash.
- Create_tmp_table::finalize didn't clear file after delete which
could cause a double free. This is however not a likely problem as
this code path is very unlikely to happen
- free_tmp_table() could do handler calls even if the table was never
opened. Fixed by adding a test if the table is opened.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
This bug could cause a crash when executing queries that used mutually
recursive CTEs with system variable big_tables set to 1. It happened due
to several bugs in the code that handled recursive table references
referred mutually recursive CTEs. For each recursive table reference a
temporary table is created that contains all rows generated for the
corresponding recursive CTE table on the previous step of recursion.
This temporary table should be created in the same way as the temporary
table created for a regular materialized derived table using the
method select_union::create_result_table(). In this case when the
temporary table is created it uses the select_union::TMP_TABLE_PARAM
structure as the parameter for the table construction. However the
code created the temporary table using just the function create_tmp_table()
and passed pointers to certain fields of the TMP_TABLE_PARAM structure
used for accumulation of rows of the recursive CTE table as parameters
for update. This was a mistake because now different temporary tables
cannot share some TMP_TABLE_PARAM fields in a general case. Besides,
depending on how mutually recursive CTE tables were defined and which
of them were referred in the executed query the select_union object
allocated for a recursive table reference could be allocated again after
the the temporary table had been created. In this case the TMP_TABLE_PARAM
object associated with the temporary table created for the recursive
table reference contained unassigned fields needed for execution when
Aria engine is employed as the engine for temporary tables.
This patch ensures that
- select_union object is created only once for any recursive table
reference
- any temporary table created for recursive CTEs uses its own
TMP_TABLE_PARAM structure
The patch also fixes a problem caused by incomplete cleanup of join tables
associated with recursive table references.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
A heuristic in best_access_path says that if for an index
ref access involved key parts which are greater than equal to that
for range access, then range access should not be considered.
The assumption made by this heuristic does not hold when
the range optimizer opted to use the group-by min-max optimization.
So the fix here would be to not consider the heuristic if
the range optimizer picked the usage of group-by min-max
optimization.
Due to a premature cleanup of the unit that specified a recursive CTE
used in the second operand of union the server fell into an infinite
loop in the reported test case. In other cases this premature cleanup
could cause other problems.
The bug is the result of a not quite correct fix for MDEV-17024. The
unit that specifies a recursive CTE has to be cleaned only after the
cleanup of the last external reference to this CTE. It means that
cleanups of the unit triggered not by the cleanup of a external
reference to the CTE must be blocked.
Usage of local table chains in selects to get external references to
recursive CTEs was not correct either because of possible merges of
some selects.
Also fixed a minor bug in st_select_lex::set_explain_type() that caused
typing 'RECURSIVE UNION' instead of 'UNION' in EXPLAIN output for external
references to a recursive CTE.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
Reimplement MDEV-14275 Improving memory utilization for information schema
Postpone temp table instantiation until after setup_fields().
Replace all unused (not marked in read_set) columns in an I_S table
with CHAR(0). This can drastically reduce the footprint of a MEMORY
table (a TABLE_CATALOG alone is 1538 bytes per row).
This does not change the engine. If the table was decided to be Aria
(because of, say, blobs) then after optimization it'll stay Aria
even if all blobs were removed.
Note 1: when transforming table structure, share->blob_fields is
preserved, otherwise Aria might switch from DYNAMIC to STATIC row format
and expect a special field for a deleted mark, which create_tmp_tabe
didn't provide.
Note 2: optimizer was doing handler::info() (to know the number of rows)
before the temp table is populated. That didn't make much sense. Now
it's done before the table is even instantiated. Preserve the old
behavior and report 0 rows.
This reverts e2664ee836 and a8458a2345
PARTITION clause in SELECT means query is non-versioned (see
WITH_PARTITION_STORAGE_ENGINE in vers_setup_conds()).
vers_setup_conds() expands such query to SYSTEM_TIME_ALL which is then
added to VIEW specification. When VIEW is queried both clauses
PARTITION and FOR SYSTEM_TIME ALL lead to ER_VERS_QUERY_IN_PARTITION
(same place WITH_PARTITION_STORAGE_ENGINE).
Fix removes FOR SYSTEM_TIME ALL from VIEW by accessing original
SYSTEM_TIME clause: the one specified in parser. As a side-effect
EXPLAIN SELECT displays SYSTEM_TIME specified in SELECT which is
user-friendly.
For join to work correctly versioning condition must be added to table
on_expr. Without that JOIN_CACHE gets expression (1)
trigcond(xtitle.row_end = TIMESTAMP'2038-01-19 06:14:07.999999') and
trigcond(xtitle.elementId = x.`id` and xtitle.pkey = 'title')
instead of (2)
trigcond(xtitle.elementId = x.`id` and xtitle.pkey = 'title')
for join_null_complements(). It is NULL-row of xtitle for
complementing the join and the above comparisons of course FALSE, but
trigcond (Item_func_trig_cond) makes them TRUE via its trig_var
property which is bound to some boolean properties of JOIN_TAB.
Expression (2) evaluated to TRUE because its trig_var is bound to
first_inner_tab->not_null_compl. The expression (1) does not evaluate
correctly because row_end comparison's trig_var is bound to
first_inner->found earlier. As a result JOIN_CACHE::check_match()
skipped the row for join_null_complements().
When we add versioning condition to table's on_expr the optimizer in
make_join_select() distributes conditions differently. tmp_cond
inherits on_expr value and in Good case it is full expression
xgender.elementId = x.`id` and xgender.pkey = 'gender' and
xgender.row_end = TIMESTAMP'2038-01-19 06:14:07.999999'
while in Bad case it is only
xgender.elementId = x.`id` and xgender.pkey = 'gender'.
Later in Good row_end condition is optimized out and we get one
trigcond in form of (2).
Diagnostics_area::set_error_status
Analysis: When strict mode is enabled, all warnings are converted to errors
including those which do not occur because of bad data.
Fix: Query should not be aborted when we have warning because limit to
examine rows was reached because it doesn't happen due to bad data.
So thd->abort_on_warning should be false.
The issue here was that the query was using ORDER BY LIMIT optimzation where
the access method was changed from EQ_REF access to an index scan (index that would
resolve the ORDER BY clause).
But the parameter READ_RECORD::unlock_row was not reset to rr_unlock_row, which is
used when the access method is not EQ_REF access.
Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
The issue here is when records are read from the temporary file
(filesort result in this case) via a cache(rr_from_cache).
The cache is initialized with init_rr_cache.
For correlated subquery the cache allocation is happening at each execution
of the subquery but the deallocation happens only once and that was
when the query execution was done.
So generally for subqueries we do two types of cleanup
1) Full cleanup: we should free all resources of the query(like temp tables).
This is done generally when the query execution is complete or the subquery
re-execution is not needed (case with uncorrelated subquery)
2) Partial cleanup: Minor cleanup that is required if
the subquery needs recalculation. This is done for all the structures that
need to be allocated for each execution (example SORT_INFO for filesort
is allocated for each execution of the correlated subquery).
The fix here would be free the cache used by rr_from_cache in the partial
cleanup phase.
* Fix the crash: IN-to-EXISTS rewrite causes an error (and so
JOIN::optimize() fails with an error, too), don't call
update_used_tables(). Terminate the query execution instead.
* Fix the cause of the error in the IN-to-EXISTS rewrite: don't do
the rewrite if doing it will cause an error of this kind:
This version of MariaDB doesn't yet support 'SUBQUERY in ROW in left
expression of IN/ALL/ANY'
* Fix another issue exposed by this testcase:
JOIN::setup_subquery_caches() may be invoked before any select has
saved its query plan, and will crash because none of the SELECTs
has called create_explain_query_if_not_exists() to create the Explain
Data Structure for this SELECT.
TODO: When merging this to 10.2, remove the poorly-placed call to
create_explain_query_if_not_exists made by fix for M_D_E_V-16153
For the case when the SJM scan table is the first table in the join order,
then if we want to do the sorting on the SJM scan table, then we need to
make sure that we unpack the values to base table fields in two cases:
1) Reading the SJM table and writing the sort-keys inside the sort-buffer
2) Reading the sorted data from the sort file
first step in moving drop table out of the handler.
todo: other methods that don't need an open table
for now hton->drop_table is optional, for backward compatibility
reasons
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
Fixed by:
- Make all quick_* variable allocated according to real number keys instead
of MAX_KEY
- Store all the quick* items in separated allocated structure (OPT_RANGE)
- Ensure we don't access any quick* variable without first checking
opt_range_keys.is_set(). Thanks to this, we don't need any
pre-initialization of quick* variables anymore.
Some renames was done to use the new structure:
table->quick_keys -> table->opt_range_keys
table->quick_rows[X] -> table->opt_range[X].rows
table->quick_key_parts[X] -> table->opt_range[X].key_parts
table->quick_costs[X] -> table->opt_range[X].cost
table->quick_index_only_costs[X] -> table->opt_range[X].index_only_cost
table->quick_n_ranges[X] -> table->opt_range[X].ranges
table->quick_condition_rows -> table->opt_range_condition_rows
This patch should both decrease memory needed for TABLE objects
(3528 -> 984 + keyinfo) and increase performance, thanks to less
initializations per query, and more localized memory, thanks to the
opt_range structure.
- Removed not needed bzero in void TABLE::initialize_quick_structures().
- Replaced bzero with TRASH_ALLOC() to have this change verfied with
memory checkers
- Added missing table->quick_keys.is_set in table_cond_selectivity()
- select_describe() should not attempt to produce query plans
for subqueries if the query is handled by a Select Handler.
- JOIN::save_explain_data_intern should not add links to Explain_select
for children selects if:
1. The whole query is handled by the Select Handler, or
2. this select (and so its children) is handled by Derived Handler.
Starting from 10.3, the optimizer is able to detect that entire outer join
nests are constants (because of "Impossible ON") and remove them (see
mark_join_nest_as_const)
However, this was not properly accounted for in NESTED_JOIN structure
and the way check_interleaving_with_nj() uses its n_tables member to
check if the join prefix order is allowed.
(The result was that the optimizer could conclude that no join prefix is
allowed and fail an assertion)
The reson for the change was to make it easier to find true errors
when searching in trace logs.
"error:" should mainly be used when we have a real error
When a prepared statement parameter '?' is used in a CTE that is used
multiple times, the following happens:
- The CTE definition is re-parsed multiple times.
- There are multiple Item_param objects referring to the same "?" in
the original query.
- Prepared_statement::param has a pointer to the first of them, the
others are "clones".
- When prepared statement parameter gets the value, it should be passed
over to clones with param->sync_clones() call.
This call is made in insert_params(), etc. It was not made in
insert_params_with_log().
This would cause Item_param to not have any value which would confuse
the query optimizer.
Added the missing call.
In case of SELECT without tables which returns either 0 or 1 rows,
JOIN::exec_inner() did not check if the flag representing SQL_CALC_FOUND_ROWS
is set or not and send_records was direclty assigned 0. So SELECT FOUND_ROWS()
was giving 0 in the output. Now it checks if the flag is set, if it is set
send_record=1 else 0. 1 is the number of rows that could have been sent
to the client if the SELECT query had SQL_CALC_FOUND_ROWS.
It is 0 when no rows were sent because the SELECT query did not have
SQL_CALC_FOUND_ROWS.
UPDATE gets access to history records because versioning conditions
are not set for VIEW. This leads to endless loop of inserting history
records when clustered index is rebuilt and ha_rnd_next() returns
newly inserted history record.
Return back original behavior of failing on write-locked table in
historical query.
35b679b9 assumed that SELECT_LEX::lock_type influences anything, but
actually at this point table is already locked. Original bug report
was tempesta-tech/mariadb#102
A temporary table is needed for window function computation but if only a NAMED WINDOW SPEC
is used and there is no window function, then there is no need to create a temporary
table as there is no stage to compute WINDOW FUNCTION
1. Code simplification:
Item_default_value handled all these values:
a. DEFAULT(field)
b. DEFAULT
c. IGNORE
and had various conditions to distinguish (a) from (b) and from (c).
Introducing a new abstract class Item_contextually_typed_value_specification,
to handle (b) and (c), so the hierarchy now looks as follows:
Item
Item_result_field
Item_ident
Item_field
Item_default_value - DEFAULT(field)
Item_contextually_typed_value_specification
Item_default_specification - DEFAULT
Item_ignore_specification - IGNORE
2. Introducing a new virtual method is_evaluable_expression() to
determine if an Item is:
- a normal expression, so its val_xxx()/get_date() methods can be called
- or a just an expression substitute, whose value methods cannot be called.
3. Disallowing Items that are not evalualble expressions in table value
constructors.
For a unique key if all the keyparts are NOT NULL or the predicates involving
the keyparts is NULL rejecting, then we can use EQ_REF access instead of ref
access with the unique key
The reason for this is to make all temporary file names similar and
also to be able to figure out from where a #sql-xxx name orginates.
New format is for most cases:
'#sql-name-current_pid-thread_id[-increment]'
Where name is one of subselect, alter, exchange, temptable or backup
The exceptions are:
ALTER PARTITION shadow files:
'#sql-shadow-thread_id-'original_table_name'
Names used with temp pool:
'#sql-name-current_pid-pool_number'
- Print the rowid filters that are available for use with each table.
- Make print_best_access_for_table() print which filter it has picked.
- Make best_access_path() print the filter for considered ref accesses.
* rename to a generic name
* move remaning initializations from query exec to prepare time
* simplify/unify key handling in open_table_from_share and delayed
* remove dead code
* move tests where they belong
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.
As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
MDEV-21606 Improve update handler (long unique keys on blobs)
MDEV-21470 MyISAM and Aria start_bulk_insert doesn't work with long unique
MDEV-21606 Bug fix for previous version of this code
MDEV-21819 2 Assertion `inited == NONE || update_handler != this'
- Move update_handler from TABLE to handler
- Move out initialization of update handler from ha_write_row() to
prepare_for_insert()
- Fixed that INSERT DELAYED works with update handler
- Give an error if using long unique with an autoincrement column
- Added handler function to check if table has long unique hash indexes
- Disable write cache in MyISAM and Aria when using update_handler as
if cache is used, the row will not be inserted until end of statement
and update_handler would not find conflicting rows.
- Removed not used handler argument from
check_duplicate_long_entries_update()
- Syntax cleanups
- Indentation fixes
- Don't use single character indentifiers for arguments
e.g.
- dont -> don't
- occurence -> occurrence
- succesfully -> successfully
- easyly -> easily
Also remove trailing space in selected files.
These changes span:
- server core
- Connect and Innobase storage engine code
- OQgraph, Sphinx and TokuDB storage engines
Related to MDEV-21769.
Create_tmp_table::add_fields(): Initialize uneven_delta= 0
to suppress the warning from GCC 9.2.1 and 10.0.1,
and consistently indent the code with spaces.
It was:
implicit conversion from 'ha_rows' (aka 'unsigned long long') to 'double'
changes value from 18446744073709551615 to 18446744073709551616
Follow what JOIN::get_examined_rows() does for similar code.
This task deals with packing the sort key inside the sort buffer, which would
lead to efficient usage of the memory allocated for the sort buffer.
The changes brought by this feature are
1) Sort buffers would have sort keys of variable length
2) The format for sort keys inside the sort buffer would look like
|<sort_length><null_byte><key_part1><null_byte><key_part2>.......|
sort_length is the extra bytes that are required to store the variable
length of a sort key.
3) When packing of sort key is done we store the ORIGINAL VALUES inside
the sort buffer and not the STRXFRM form (mem-comparable sort keys).
4) Special comparison function packed_keys_comparison() is introduced
to compare 2 sort keys.
This patch also contains contributions from Sergei Petrunia.
- Added unlikely() to optimize for not having optimizer trace enabled
- Made THD::trace_started() inline
- Added 'if (trace_enabled())' around some potentially expensive code
(not many found)
- Added ASSERT's to ensure we don't call expensive optimizer trace calls
if optimizer trace is not enabled
- Added length to Json_writer functions to speed up buffer writes
when optimizer trace is enabled.
- Changed LEX_CSTRING argument handling to not send full struct to writer
function on_add_str() functions now trusts length arguments
[Variant 2 of the fix: collect the attached conditions]
Problem:
make_join_select() has a section of code which starts with
"We plan to scan all rows. Check again if we should use an index."
the code in that section will [unnecessarily] re-run the range
optimizer using this condition:
condition_attached_to_current_table AND current_table's_ON_expr
Note that the original invocation of range optimizer in
make_join_statistics was done using the whole select's WHERE condition.
Taking the whole select's WHERE condition and using multiple-equalities
allowed the range optimizer to infer more range restrictions.
The fix:
- Do range optimization using a condition that is an AND of this table's
condition and all of the previous tables' conditions.
- Also, fix the range optimizer to prefer SEL_ARGs with type=KEY_RANGE
over SEL_ARGS with type=MAYBE_KEY, regardless of the key part.
Computing
key_and(
SEL_ARG(type=MAYBE_KEY key_part=1),
SEL_ARG(type=KEY_RANGE, key_part=2)
)
will now produce the SEL_ARG with type=KEY_RANGE.
This task deals with packing the non-sorted fields (or addon fields).
This would lead to efficient usage of the memory allocated for the sort buffer.
The changes brought by this feature are
1) Sort buffers would have records of variable length
2) Each record in the sort buffer would be stored like
<sort_key1><sort_key2>....<addon_length><null_bytes><field1><field2>....
addon_length is the extra bytes that are required to store the variable
length of addon field across different records.
3) Changes in rr_unpack_from_buffer and rr_from_tempfile to take into account
the variable length of records.
Ported WL#1509 Pack values of non-sorted fields in the sort buffer from
MySQL by Tor Didriksen
row_search_idx_cond_check with rowid_filter upon concurrent access to table
This bug has nothing to do with the concurrent access to table. Rather it
concerns queries for which the optimizer decides to employ a rowid filter
when accessing an InnoDB table by a secondary index, but later when
calling test_if_skip_sort_order() changes its mind to access the table by
the primary key.
Currently usage of rowid filters is not supported in InnoDB if the table
is accessed by the primary key. So in this case usage of a rowid filter
to access the table must be prohibited.
In this scenario:
- There is a possible range access for table T
- And there is a ref access on the same index which uses fewer key parts
- The join optimizer picks the ref access (because it is cheaper)
- make_join_select applies this heuristic to switch to range:
/* Range uses longer key; Use this instead of ref on key */
Join buffer will be used without having called
JOIN_TAB::make_scan_filter(). This means, conditions that should be
checked when reading table T will be checked after T is joined with the
contents of the join buffer, instead.
Fixed this by adding a make_scan_filter() check.
(updated patch after backport to 10.3)
(Fix testcase on Windows)
The issue here is for degenerate joins we should execute the window
function but it is not getting executed in all the cases.
To get the window function values window function needs to be executed
always. This currently does not happen in few cases
where the join would return 0 or 1 row like
1) IMPOSSIBLE WHERE
2) MIN/MAX optimization
3) EMPTY CONST TABLE
The fix is to make sure that window functions get executed
and the temporary table is setup for the execution of window functions
The query requires 2 temporary tables for execution, the window function
is always attached to the last temporary table, but in this case the
result field of the window function points to the first temporary table
rather than the last one.
Fixed this by not changing window function items with temporary table
items of the first temporary table.
Don't do skip_setup_conds() unless all errors are checked.
Fixes following errors:
ER_PERIOD_NOT_FOUND
ER_VERS_QUERY_IN_PARTITION
ER_VERS_ENGINE_UNSUPPORTED
ER_VERS_NOT_VERSIONED
Don't do skip_setup_conds() unless all errors are checked.
Fixes following errors:
ER_PERIOD_NOT_FOUND
ER_VERS_QUERY_IN_PARTITION
ER_VERS_ENGINE_UNSUPPORTED
ER_VERS_NOT_VERSIONED
in row_search_idx_cond_check
When usage of rowid filter is evaluated by the optimizer to join a table
to the current partial join employing a certain index it should be checked
that a key for at least the major component of this index can be constructed
using values from the columns of the partial join.
in row_search_idx_cond_check
For a single table query with ORDER BY and several sargable range
conditions the optimizer may choose an execution plan that employs
a rowid filter. In this case it is important to build the filter before
calling the function JOIN_TAB::sort_table() that creates sort index
for the result set, because when this is index created the filter has
to be already filled. After the sort index has been created the
filter must be deactivated. If not to do this the innodb function
row_search_idx_cond_check() is getting confused when it has to read rows
from the created sort index by using ha_rnd_pos().
The order of actions mentioned above is needed also when processing a
join query if sorting is performed for the first non constant table in
the chosen execution plan.
MDEV-18957 UPDATE with LIMIT clause is wrong for versioned partitioned tables
UPDATE, DELETE: replace linear search of current/historical records
with vers_setup_conds().
Additional DML cases in view.test
Count the "gap" time between table accesses and display it as
r_other_time_ms in the "table" element.
* The advantage of this approach is that it doesn't add any new
my_timer_cycles() calls.
* The disadvantage is that the definition of what is done during
"other time" is not that clear: it includes checking the WHERE
(for this table), constructing index lookup tuple (for the next table)
writing to GROUP BY temporary table (as we dont account for that time
separately [yet], etc)
The issue here is the wrong estimate of the cardinality of a partial join,
the cardinality is too high because the function table_cond_selectivity()
returns an absurd number 100 while selectivity cannot be greater than 1.
When accessing table t by outer reference t1.a via index we do not perform any
range analysis for t. Yet we see TABLE::quick_key_parts[key] and
TABLE->quick_rows[key] contain a non-zero value though these should have been
remained untouched and equal to 0.
Thus real cause of the problem is that TABLE::init does not clean the arrays
TABLE::quick_key_parts[] and TABLE::>quick_rows[].
It should have done it because the TABLE structure created for any
instance of a table can be reused for many queries.
A conflict between MDEV-19514 (b42294bc64)
and MDEV-20934 (d7a2401750)
was resolved. We will not invoke the function ibuf_delete_recs()
from ibuf_merge_or_delete_for_page(). Instead, we will add that
logic to the function ibuf_read_merge_pages().
In the function prev_record_reads where one finds the different row combinations for a
subset of partial join, it did not take into account the selectivity of tables
involved in the subset of partial join.
because internally setup_wild() adjusts select_lex->with_wild directly
anyway, so there is no reason to pretend that the number of '*' may be
anything else but select_lex->with_wild
And don't update select_lex->item_list, because fields can come
from anywhere and don't necessarily have to be copied into select_lex.
Now both offset and limit are stored and do not chenged during execution
(offset is decreased during processing in versions before 10.5).
(Big part of this changes made by Monty)
These two methods:
- Item_result_field::create_tmp_field_ex()
- Item_func_user_var::create_tmp_field_ex()
had duplicate code, except that they used a different type handler.
Adding a protected method Item_result_field::create_tmp_field_ex_from_handler()
with a "const Type_handler*" parameter, and reusing it from the
two mentioned methods.
1. Removed TIMESTAMP/TRANSACTION unit auto-detection in favor of default TIMESTAMP.
Reasons:
1.1. rare practical use and doubtful advantage of such auto-detection;
1.2. it conflicts with MDEV-16226 (TRX_ID-based versioned tables performance improvement).
Needless check_unit membership removed.
2. SQL: versioning type handling refactoring
Vers_type_handler hierarchy stores versioning properties of type.
virtual Type_handler::vers() accesses specialization of
Vers_type_handler for specific type.
virtual Vers_type_handler::kind() returns versioning kind
(timestamp/trx_id).
Removed Type_handler::Vers_history_point_check_unit() in favor of
Type_handler::vers().
Renames:
require_timestamp() -> require_timestamp_error()
require_trx_id() -> require_trx_id_error()
EDIT by Alexander Barkov (@abarkov):
check_sys_fields() moved to Vers_type_handler::check_sys_fields()
selectivity values fails
After having set the assertion that checks validity of selectivity values
returned by the function table_cond_selectivity() a test case from
order_by.tesst failed. The failure occurred because range optimizer could
return as an estimate of the cardinality of the ranges built for an index
a number exceeding the total number of records in the table.
The second bug is more subtle. It may happen when there are several
indexes with same prefix defined on the first joined table t accessed by
a constant ref access. In this case the range optimizer estimates the
number of accessed records of t for each usable index and these
estimates can be different. Only the first of these estimates is taken
into account when the selectivity of the ref access is calculated.
However the optimizer later can choose a different index that provides
a different estimate. The function table_condition_selectivity() could use
this estimate to discount the selectivity of the ref access. This could
lead to an selectivity value returned by this function that was greater
that 1.
best_access_path() is called from two optimization phases:
1. Plan choice phase, in choose_plan(). Here, the join prefix being
considered is in join->positions[]
2. Plan refinement stage, in fix_semijoin_strategies_for_picked_join_order
Here, the join prefix is in join->best_positions[]
It used to access join->positions[] from stage #2. This didnt cause any
valgrind or asan failures (as join->positions[] has been written-to before)
but the effect was similar to that of reading the random data:
The join prefix we've picked (in join->best_positions) could have
nothing in common with the join prefix that was last to be considered
(in join->positions).
This patch introduces the optimization that allows range optimizer to
consider index range scans that are built employing NOT NULL predicates
inferred from WHERE conditions and ON expressions.
The patch adds a new optimizer switch not_null_range_scan.
(Backported to 10.3, addressed review input)
Sj_materialization_picker::check_qep(): fix error in cost/fanout
calculations:
- for each join prefix, add #prefix_rows / TIME_FOR_COMPARE to the cost,
like best_extension_by_limited_search does
- Remove the fanout produced by the subquery tables.
- Also take into account join condition selectivity
optimize_wo_join_buffering() (used by LooseScan and FirstMatch)
- also add #prefix_rows / TIME_FOR_COMPARE to the cost of each prefix.
- Also take into account join condition selectivity
handlerton that is able to processes the whole query. For that it
traverses tables from subqueries.
Select_handler now cleans up temporary table structures on dctor call.
The MDEV-20265 commit e746f451d5
introduces DBUG_ASSERT(right_op == r_tbl) in
st_select_lex::add_cross_joined_table(), and that assertion would
fail in several tests that exercise joins. That commit was skipped
in this merge, and a separate fix of MDEV-20265 will be necessary in 10.4.
Current easy fix is not possible, because SELECT clones ha_partition
and then closes the clone which leads to unclosed transaction in
partitions we forcely prune out. We cound solve this by closing these
partitions (and release from transaction) in
change_partitions_to_open() at versioning conditions stage, but this
is problematic because table lock is acquired for each partition at
open stage and therefore must be released when we close partition
handler in change_partitions_to_open(). More details in MDEV-20376.
This should change after MDEV-20250 where mechanism of opening
partitions will be improved.
This reverts commit cdbac54df0.
For MDEV-15955, the fix in create_tmp_field_from_item() would cause a
compilation error. After a discussion with Alexander Barkov, the fix
was omitted and only the test case was kept.
In 10.3 and later, MDEV-15955 is fixed properly by overriding
create_tmp_field() in Item_func_user_var.
When discounting selectivity of ref access, don't discount the
selectivity we've already discounted for range access.
The 10.1 version of the fix. Will need to adjust condition filtering
test results in 10.4
Exclude SELECT and INSERT SELECT from vers_set_hist_part(). We cannot
likewise exclude REPLACE SELECT because it may REPLACE into itself
(and REPLACE generates history).
INSERT also does not generate history, but we have history
modification setting which might be interfered.
The bug occured when the optimizer decided to use a rowid filter built
by a range index scan to access an InnoDB table with generated clustered
index.
When a table is accessed by a secondary index Idx employing a rowid filter the
the value of pk contained in the found index tuple is checked against the
filter. A call of the handler function position is supposed to put the
pk value into the handler::ref buffer. However for generated clustered
primary keys it did not happened. The patch fixes this problem.
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
and WHERE filter afterwards
This patch complements the patch fixing the bug MDEV-6892. The latter
properly handled queries that used mergeable views returning constant
columns as inner tables of outer joins and whose where clause contained
predicates referring to these columns if the predicates of happened not
to be equality predicates. Otherwise the server still could return wrong
result sets for such queries. Besides the fix for MDEV-6892 prevented
some possible conversions of outer joins to inner joins for such queries.
This patch corrected the function check_simple_equality() to handle
properly conjunctive equalities of the where clause that refer to the
constant columns of mergeable views used as inner tables of an outer join.
The patch also changed the code of Item_direct_view_ref::not_null_tables().
This change allowed to take into account predicates containing references
to constant columns of mergeable views when converting outer joins into
inner joins.
Removed Field_map, since it was used only in a single function.
Fixed is_indexed_agg_distinct(), since it relied on initialization of
Bitmap in constructor.
Handling of top level conjuncts in WHERE whose used_tables() contained
RAND_TABLE_BIT in the function make_join_select() was incorrect.
As a result if such a conjunct referred to fields non of which belonged
to the last joined table it was pushed twice. (This could be seen
for a test case from subselect.test whose output was changed after this
patch had been applied. In 10.1 when running EXPLAIN FORMAT=JSON for
the query from this test case we clearly see that one of the conjuncts
is pushed twice.) This fact by itself was not good. Besides, if such a
conjunct was pushed to a table that was the result of materialization
of a semi-join the query could return a wrong result set. In particular
we could watch it for queries with semi-join subqueries whose left parts
used stored functions without "deterministic' specifier.
This patch complements the patch that fixes bug MDEV-18479.
This patch takes care of possible overflow when calculating the
estimated number of rows in a materialized derived table / view.
or server crashes in JOIN::fix_all_splittings_in_plan after EXPLAIN
This patch resolves the problem of overflowing when performing
calculations to estimate the cost of an evaluated query execution plan.
The overflowing in a non-debug build could cause different kind of
problems uncluding crashes of the server.
This patch corrects the patch for the bug 10006. The latter incorrectly
calculates the attribute TABLE_LIST::dep_tables for inner tables
of outer joins that are to be converted into inner joins.
As a result after the patch some valid join orders were not evaluated
and the optimizer could choose an execution plan that was far from
being optimal.
The code in best_access_path function, when it does not find a key suitable for ref access
and join_cache_level is set to a value so that hash_join is possible we build a hash key.
Later in the function we compare the cost of ref access with table scan (or index scan
or quick selects). No need to do this when we have got the hash key.
The issue in this case is that we take in account the estimates from quick keys instead of rec_per_key.
The estimates for quick keys are better than rec_per_key only if we have ref(const), so we need to check
that all keyparts in the ref key are of the type ref(const).
For a table column `a`, the above expressions logically
equate to false in all cases.
With this patch the optimizer knows about this and queries
like:
SELECT * FROM t1 WHERE a!=a
no longer need to evaluate a!=a for every row.
The same applies if the expression was `a<a`, or `a>a`
An `EXPLAIN SELECT COOUNT(*) FROM t1 WHERE a<a` will show:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
Similarly `NOT (a!=a)` is always true.
EXPLAIN SELECT COUNT(*) FROM t1 WHERE not (a!=a);
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
With MAX_INDEXIES=64(default), key_map=Bitmap<64> is just a wrapper around
ulonglong and thus "trivial" (can be bzero-ed, or memcpy-ed, and stays
valid still)
With MAX_INDEXES=128, key_map = Bitmap<128> is not a "trivial" type
anymore. The implementation uses MY_BITMAP, and MY_BITMAP contains pointers
which make Bitmap invalid, when it is memcpy-ed/bzero-ed.
The problem in 10.4 is that there are many new key_map members, inside TABLE
or KEY, and those are often memcopied and bzeroed
The fix makes Bitmap "trivial", by inlining most of MY_BITMAP functionality.
pointers/heap allocations are not used anymore.
Need to call split_sum_func if an aggregate function is part of order by
or partition by clause so that we have the required fields inside the temporary
table, as all the fields inside the partition by and order by clause of the
window function needs to be there in the temp table used for window function
computation.
The issue here is that for a window function in the ORDER BY clause, we were not
creating an extra field in the temporary table for the window function
(which is contained in an expression).
So a call to split_sum_func is added to handle this case
Also we need to update all items that contain a window function
in the temp table during window function computation as filesort would need
these values to be updated to calculate the ORDER BY clause of the select.
For degenerate joins we may have JOIN::table_list as NULL, so instead
of using JOIN::top_join_tab_count use the function JOIN::exec_join_tab_cnt
to get the number of tables joined at the top level.
with GROUP BY + ORDER BY
The method JOIN::create_postjoin_aggr_table() should not call
call JOIN::add_sorting_to_table() unless the first non-constant join
table is passed as the first parameter to the method.
Currently usage of range rowid filters can be combined only with
ref access and single index range access. So if the optimizer has
chosen some other quick select method to access a joined table
then no range rowid filter can be used for this table.
If SUBS_IN_TO_EXISTS strategy has been chosen for a subquery then
additional conditions are injected into WHERE/ON/HAVING of this subquery
and it may happen that test_quick_select() invoked from
JOIN::make_range_rowid_filters() discovers impossible range. This
must be checked.
This bug is caused by pushdown from HAVING into WHERE.
It appears because condition that is pushed wasn't fixed.
It is also discovered that condition pushdown from HAVING into
WHERE is done wrong. There is no need to build clones for some
conditions that can be pushed. They can be simply moved from HAVING
into WHERE without cloning.
build_pushable_cond_for_having_pushdown(),
remove_pushed_top_conjuncts_for_having() methods are changed.
It is found that there is no transformation made for fields of
pushed condition.
field_transformer_for_having_pushdown transformer is added.
New tests are added. Some comments are changed.
The main problem was lack of proper QueryArena handling in
`period_setup_conds`. Since mysql_prepare_update/mysql_prepare_delete
are called during `PREPARE` statement, period conditions, should be
allocated on statement query arena.
Another problem is incorrect statement state handling in
period_setup_conds, which led to unexpected mysql_update termination.
* mysql_update: move period_setup_conds() to mysql_prepare_update to
store conditions in statement's mem_root
* mtr: add period suite to default list, since --ps-protocol is now
fixed
Fixes bugs:
MDEV-18853 Assertion `0' failed in Protocol::end_statement upon DELETE .. FOR PORTION via prepared statement
MDEV-18852 Server crashes in reinit_stmt_before_use upon UPDATE .. FOR PORTION via prepared statement
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
In the function make_cond_for_table_from_pred a call of ix_fields()
missed checking of the return code. As a result an extracted constant
condition could be not well formed and this caused an assertion failure.
There were two newly enabled warnings:
1. cast for a function pointers. Affected sql_analyse.h, mi_write.c
and ma_write.cc, mf_iocache-t.cc, mysqlbinlog.cc, encryption.cc, etc
2. memcpy/memset of nontrivial structures. Fixed as:
* the warning disabled for InnoDB
* TABLE, TABLE_SHARE, and TABLE_LIST got a new method reset() which
does the bzero(), which is safe for these classes, but any other
bzero() will still cause a warning
* Table_scope_and_contents_source_st uses `TABLE_LIST *` (trivial)
instead of `SQL_I_List<TABLE_LIST>` (not trivial) so it's safe to
bzero now.
* added casts in debug_sync.cc and sql_select.cc (for JOIN)
* move assignment method for MDL_request instead of memcpy()
* PARTIAL_INDEX_INTERSECT_INFO::init() instead of bzero()
* remove constructor from READ_RECORD() to make it trivial
* replace some memcpy() with c++ copy assignments
If a splittable materialized derived table / view T is used in a inner nest
of an outer join with impossible ON condition then T is marked as a
constant table. Yet the execution plan to build T is still searched for
in spite of the fact that is not needed. So it should be set.
When the chosen execution plan accesses a join table employing a range
rowid filter a quick select to scan this range has to be built. This
quick select is built by a call of SQL_SELECT::test_quick_select().
At this call the function should allow to evaluate only single index
range scans. In order to be able to do this a new parameter was added
to this function.
This patch implements engine independent unique hash index.
Usage:- Unique HASH index can be created automatically for blob/varchar/test column whose key
length > handler->max_key_length()
or it can be explicitly specified.
Automatic Creation:-
Create TABLE t1 (a blob unique);
Explicit Creation:-
Create TABLE t1 (a int , unique(a) using HASH);
Internal KEY_PART Representations:-
Long unique key_info will have 2 representations.
(lets understand this with an example create table t1(a blob, b blob , unique(a, b)); )
1. User Given Representation:- key_info->key_part array will be similar to what user has defined.
So in case of example it will have 2 key_parts (a, b)
2. Storage Engine Representation:- In this case there will be only one key_part and it will point to
HASH_FIELD. This key_part will be always after user defined key_parts.
So:- User Given Representation [a] [b] [hash_key_part]
key_info->key_part ----^
Storage Engine Representation [a] [b] [hash_key_part]
key_info->key_part ------------^
Table->s->key_info will have User Given Representation, While table->key_info will have Storage Engine
Representation.Representation can be changed into each other by calling re/setup_keyinfo_hash function.
Working:-
1. So when user specifies HASH_INDEX or key_length is > handler->max_key_length(), In mysql_prepare_create_table
One extra vfield is added (for each long unique key). And key_info->algorithm is set to HA_KEY_ALG_LONG_HASH.
2. In init_from_binary_frm_image values for hash_keypart is set (like fieldnr , field and flags)
3. In parse_vcol_defs, HASH_FIELD->vcol_info is created. Item_func_hash is used with list of Item_fields,
When Explicit length is given by user then Item_left is used to concatenate Item_field values.
4. In ha_write_row/ha_update_row check_duplicate_long_entry_key is called which will create the hash key from
table->record[0] and then call ha_index_read_map , if we found duplicated hash , we will compare the result
field by field.
* inject portion of time updates into mysql_delete main loop
* triggered case emits delete+insert, no updates
* PORTION OF `SYSTEM_TIME` is forbidden
* `DELETE HISTORY .. FOR PORTION OF ...` is forbidden as well
Optimized the code that removed multiple equalities pushed from HAVING
into WHERE. Now this removal is postponed until all multiple equalities
are eliminated in substitute_for_best_equal_field().
Condition can be pushed from the HAVING clause into the WHERE clause
if it depends only on the fields that are used in the GROUP BY list
or depends on the fields that are equal to grouping fields.
Aggregate functions can't be pushed down.
How the pushdown is performed on the example:
SELECT t1.a,MAX(t1.b)
FROM t1
GROUP BY t1.a
HAVING (t1.a>2) AND (MAX(c)>12);
=>
SELECT t1.a,MAX(t1.b)
FROM t1
WHERE (t1.a>2)
GROUP BY t1.a
HAVING (MAX(c)>12);
The implementation scheme:
1. Extract the most restrictive condition cond from the HAVING clause of
the select that depends only on the fields that are used in the GROUP BY
list of the select (directly or indirectly through equalities)
2. Save cond as a condition that can be pushed into the WHERE clause
of the select
3. Remove cond from the HAVING clause if it is possible
The optimization is implemented in the function
st_select_lex::pushdown_from_having_into_where().
New test file having_cond_pushdown.test is created.
Due to inconsistent usage of different cost models to calculate
the cost of ref accesses we have to make the calculation of the
gain promising by usage a range filter more complex.
This task involves the implementation for the optimizer trace.
This feature produces a trace for any SELECT/UPDATE/DELETE/,
which contains information about decisions taken by the optimizer during
the optimization phase (choice of table access method, various costs,
transformations, etc). This feature would help to tell why some decisions were
taken by the optimizer and why some were rejected.
Trace is session-local, controlled by the @@optimizer_trace variable.
To enable optimizer trace we need to write:
set @@optimizer_trace variable= 'enabled=on';
To display the trace one can run:
SELECT trace FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
This task also involves:
MDEV-18489: Limit the memory used by the optimizer trace
introduces a switch optimizer_trace_max_mem_size which limits
the memory used by the optimizer trace. This was implemented by
Sergei Petrunia.
MDEV-17631 select_handler for a full query pushdown
Interfaces + Proof of Concept for federatedx with test cases.
The interfaces have been developed for integration of ColumnStore engine.
ANALYZE and ANALYZE FORMAT=JSON structures are changed in the way that they
show additional information when rowid filter is used:
- r_selectivity_pct - the observed filter selectivity
- r_buffer_size - the size of the rowid filter container buffer
- r_filling_time_ms - how long it took to fill rowid filter container
New class Rowid_filter_tracker was added. This class is needed to collect data
about how rowid filter is executed.
remove TABLE_SHARE::error_table_name() and TABLE_SHARE::orig_table_name
(that was allocated in a wrong memroot in this bug).
instead, simply set TABLE_SHARE::table_name correctly.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.
Also fixes:
MDEV-17741 Assertion `thd->Item_change_list::is_empty()' failed in mysql_parse after unsuccessful PS
The problem was introduced by:
commit f033fbd9f2
Changed the test case for MDEV-15571
It was later fixed, but in 10.3 only:
commit ce2cf855bf
MDEV-16043 Assertion thd->Item_change_list::is_empty() failed in mysql_parse
upon SELECT from a view reading from a versioned table
This patch is a backport of ce2cf855bf to 10.2