Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Categories: connectors experimental galera main
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

connectors experimental galera main
Sergei Petrunia
Refactoring: simplify code in estimate_table_group_cardinality()

Should work exactly the same as before.

Do not search in join->join_tab array to find the table by its table_map
bit. Get it from the Item_field object and Field that refers to that table.

(This is preparation for fix for MDEV-37732)
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Sergei Petrunia
MDEV-39368: When replay query fails, print "*** REPLAY FAILED ***" instead of EXPLAIN output.
Yuchen Pei
MDEV-39361 Assign Name resolution context in subst_vcol_if_compatible to the new vcol Item_field

Add a context field to vcol_info, and assign value in a Item_field
constructor as well as during substitution.
Sergei Golubchik
MDEV-39404 Gtid_log_event crash

validate xid.gtrid_length and xid.bqual_length just like XID_EVENT does
Marko Mäkelä
MDEV-37949: Implement innodb_log_archive

The InnoDB write-ahead log file in the old innodb_log_archive=OFF
format is named ib_logfile0, pre-allocated to innodb_log_file_size and
written as a ring buffer. This is good for write performance and space
management, but unsuitable for arbitrary point-in-time recovery or for
facilitating efficient incremental backup.

innodb_log_archive=ON: A new format where InnoDB will create and
preallocate files ib_%016x.log, instead of writing a circular file
ib_logfile0. Each file will be pre-allocated to innodb_log_file_size.
Once a log fills up, we will create and pre-allocate another log file,
to which log records will be written.  Upon the completion of the first
log checkpoint in a recently created log file, the old log file will
be marked read-only, signaling that there will be no further writes to
that file, and that the file may safely be moved to long-term storage.

The file name includes the log sequence number (LSN) at file offset
12288 (log_t::START_OFFSET). Each checkpoint is identified by storing
a 64-bit big-endian offset into an optional sequence of FILE_MODIFY
records followed by a FILE_CHECKPOINT record, between 12288 and the
end of the file.

The innodb_encrypt_log format is identified by storing the encryption
information at the start of the log file. The first 64-bit value will
be 1, which is an invalid checkpoint offset. Each
innodb_log_archive=ON log must use the same encryption parameters.
Changing innodb_encrypt_log or related parameters is only possible by
setting innodb_log_archive=OFF and restarting the server, which will
permanently lose the history of the archived log.

The maximum number of log checkpoints that the innodb_log_archive=ON
file header can represent is limited to 12288/8=1536 when using
innodb_encrypt_log=OFF. If we run out of slots in a log file, each
subsequently completed checkpoint in that log file will overwrite the
last slot in the checkpoint header, until we switch to the next log.

innodb_log_recovery_start: The checkpoint LSN to start recovery from.
This will be useful when recovering from an archived log. This is useful
for restoring an incremental backup (applying InnoDB log files that were
copied since the previous restore).

innodb_log_recovery_target: The requested LSN to end recovery at.
When this is set, all persistent InnoDB tables will be read-only, and
no writes to the log are allowed. The intended purpose of this setting
is to prepare an incremental backup, as well as to allow data
retrieval as of a particular logical point of time.

Setting innodb_log_recovery_target>0 is much like setting
innodb_read_only=ON, with the exception that the data files may be
written to by crash recovery, and locking reads will conflict with any
incomplete transactions as necessary, and all transaction isolation
levels will work normally (not hard-wired to READ UNCOMMITTED).

srv_read_only_mode: When this is set (innodb_read_only=ON), also
recv_sys.rpo (innodb_log_recovery_target) will be set to the current LSN.
This ensures that it will suffice to check only one of these variables
when blocking writes to persistent tables.

The status variable innodb_lsn_archived will reflect the LSN
since when a complete InnoDB log archive is available. Its initial
value will be that of the new parameter innodb_log_archive_start.
If that variable is 0 (the default), the innodb_lsn_archived will
be recovered from the available log files. If innodb_log_archive=OFF,
innodb_lsn_archived will be adjusted to the latest checkpoint every
time a log checkpoint is executed. If innodb_log_archive=ON, the value
should not change.

SET GLOBAL innodb_log_archive=!@@GLOBAL.innodb_log_archive will take
effect as soon as possible, possibly after a log checkpoint has been
completed. The log file will be renamed between ib_logfile0 and
ib_%016x.log as appropriate.

When innodb_log_archive=ON, the setting SET GLOBAL innodb_log_file_size
will affect subsequently created log files when the file that is being
currently written is running out. If we are switching log files exactly
at the same time, then a somewhat misleading error message
"innodb_log_file_size change is already in progress" will be issued.

no_checkpoint_prepare.inc: A new file, to prepare for subsequent
inclusion of no_checkpoint_end.inc. We will invoke the server to
parse the log and to determine the latest checkpoint.

All --suite=encryption tests that use innodb_encrypt_log
will be skipped for innodb_log_archive=ON, because enabling
or disabling encryption on the log is not possible without
temporarily setting innodb_log_archive=OFF and restarting
the server. The idea is to add the following arguments to an
invocation of mysql-test/mtr:

--mysqld=--loose-innodb-log-archive \
--mysqld=--loose-innodb-log-recovery-start=12288 \
--mysqld=--loose-innodb-log-file-mmap=OFF \
--skip-test=mariabackup

Alternatively, specify --mysqld=--loose-innodb-log-file-mmap=ON
to cover both code paths.

The mariabackup test suite must be skipped when using the
innodb_log_archive=ON format, because mariadb-backup will only
support the old ib_logfile0 format (innodb_log_archive=OFF).

A number of tests would fail when the parameter
innodb_log_recovery_start=12288 is present, which is forcing
recovery to start from the beginning of the history
(the database creation). The affected tests have been adjusted with
explicit --innodb-log-recovery-start=0 to override that:

(0) Some injected corruption may be "healed" by replaying the log
from the beginning. Some tests expect an empty buffer pool after
a restart, with no page I/O due to crash recovery.
(1) Any test that sets innodb_read_only=ON would fail with an error
message that the setting prevents crash recovery, unless
innodb_log_recovery_start=0.
(2) Any test that changes innodb_undo_tablespaces would fail in crash
recovery, because crash recovery assumes that the undo tablespace ID
that is available from the undo* files corresponds with the start of
the log. This is an unforunate design bug which we cannot fix easily.

log_sys.first_lsn: The start of the current log file, to be consulted
in log_t::write_checkpoint() when renaming files.

log_sys.archived_lsn: New field: The value of innodb_lsn_archived.

log_sys.end_lsn: New field: The log_sys.get_lsn() when the latest
checkpoint was initiated. That is, the start LSN of a possibly empty
sequence of FILE_MODIFY records followed by FILE_CHECKPOINT.

log_sys.resize_target: The value of innodb_log_file_size that will be
used for creating the next archive log file once the current file (of
log_sys.file_size) fills up.

log_sys.archive: New field: The value of innodb_log_archive.

log_sys.next_checkpoint_no: Widen to uint16_t. There may be up to
12288/8=1536 checkpoints in the header.

log_sys.log: If innodb_log_archive=ON, this file handle will be kept
open also in the PMEM code path.

log_sys.resize_log: If innodb_log_archive=ON, we may have two log
files open both during normal operation and when parsing the log. This
will store the other handle (old or new file).

log_sys.resize_buf: In the memory-mapped code path, this will point
to the file resize_log when innodb_log_archive=ON.

recv_sys.log_archive: All innodb_log_archive=ON files that will be
considered in recovery.

recv_sys.was_archive: A flag indicating that an innodb_log_archive=ON
file is in innodb_log_archive=OFF format.

log_sys.is_pmem, log_t::is_mmap_writeable(): A new predicate.
If is_mmap_writeable(), we assert and guarantee buf_size == capacity().

log_t::archive_new_write(): Create and allocate a new log file, and
write the outstanding data to both the current and the new file, or
only to the new file, until write_checkpoint() completes the first
checkpoint in the new file.

log_t::archived_mmap_switch_prepare(): Create and memory-map a new log
file, and update file_size to resize_target. Remember the file handle
of the current log in resize_log, so that write_checkpoint() will be
able to make it read-only.

log_t::archived_mmap_switch_complete(): Switch to the buffer that was
created in archived_mmap_switch_prepare().

log_t::archive_create(bool): Create and allocate an archive log file.

log_t::write_checkpoint(): Allow an old checkpoint to be completed in
the old log file even after a new one has been created. If we are
writing the first checkpoint in a new log file, we will mark the old
log file read-only. We will also update log_sys.first_lsn unless it
was already updated in ARCHIVED_MMAP code path. In that code path,
there is the special case where log_sys.resize_buf == nullptr and
log_sys.checkpoint_buf points to log_sys.resize_log (the old log file
that is about to be made read-only). In this case, log_sys.first_lsn
will already point to the start of the current log_sys.log, even
though the switch has not been fully completed yet.
Try to preallocate the next archive file if needed, with the goal that
when we need the file it will already be ready for use.

log_t::header_rewrite(my_bool): Rewrite the log file header before or
after renaming the log file, and write a message about the change,
so that there will be a chance to recover in case the server is being
killed during this operation.  The recovery of the last ib_%016%.log
does tolerate also the ib_logfile0 format.

log_t::set_archive(my_bool,THD): Implement SET GLOBAL innodb_log_archive.
An error will be returned if non-archived SET GLOBAL innodb_log_file_size
(log file resizing) is in progress. Wait for checkpoint if necessary.
The current log file will be renamed to either ib_logfile0 or
ib_%016x.log, as appropriate. In SET GLOBAL innodb_log_archive=OFF,
we trigger a write-ahead of the log if necessary, to prevent overrun.

log_t::archive_rename(): Rename an archived log to ib_logfile0 on recovery
in case there had been a crash during set_archive().

log_t::archive_set_size(): A new function, to ensure that
log_sys.resize_target is set on startup.

log_checkpoint_low(): Do not prevent a checkpoint at the start of a file.
We want the first innodb_log_archive=ON file to start with a checkpoint.

log_t::create(lsn_t): Initialize last_checkpoint_lsn. Initialize the
log header as specified by log_sys.archive (innodb_log_archive).

log_write_buf(): Add the parameter max_length, the file wrap limit.

log_write_up_to(), mtr_t::commit_log_release<bool mmap=true>():
If we are switching log files, invoke buf_flush_ahead(lsn, true)
to ensure that a log checkpoint will be completed in the new file.

mtr_t::finish_writer(): Specialize for innodb_log_archive=ON.

mtr_t::commit_file(): Ensure that log archive rotation will complete.

log_t::append_prepare<log_t::ARCHIVED_MMAP>(): Special case.

log_t::get_path(): Get the name of the current log file.

log_t::get_circular_path(size_t): Get the path name of a circular file.
Replaces get_log_file_path().

log_t::get_archive_path(lsn_t): Return a name of an archived log file.

log_t::get_next_archive_path(): Return the name of the next archived log.

log_t::append_archive_name(): Append the archive log file name
to a path string.

mtr_t::finish_writer(): Invoke log_close() only if innodb_log_archive=OFF.
In the innodb_log_archive=ON, we only force log checkpoints after creating
a new archive file, to ensure that the first checkpoint will be written
as soon as possible.

log_t::checkpoint_margin(): Replaces log_checkpoint_margin().
If a new archived log file has been created, wait for the
first checkpoint in that file.

srv_log_rebuild_if_needed(): Never rebuild if innodb_log_archive=ON.
The setting innodb_log_file_size will affect the creation of
subsequent log files. The parameter innodb_encrypt_log cannot be
changed while the log is in the innodb_log_archive=ON format.

log_t::attach(), log_mmap(): Add the parameter log_access,
to distinguish memory-mapped or read-only access.

log_t::attach(): When disabling innodb_log_file_mmap, read
checkpoint_buf from the last innodb_log_archive=ON file.

log_t::clear_mmap(): Clear the tail of the checkpoint buffer
if is_mmap_writeable().

log_t::set_recovered(): Invoke clear_mmap(), and restore the
log buffer to the correct position.

recv_sys_t::apply(): Let log_t::clear_mmap() enable log writes.

log_file_is_zero(): Check if a log file starts with NUL bytes
(is a preallocated file).

recv_sys_t::find_checkpoint(): Find and remember the checkpoint position
in the last file when innodb_log_recovery_start points to an older file.
When innodb_log_file_mmap=OFF, restore log_sys.checkpoint_buf from
the latest log file. If the last archive log file is actually
in innodb_log_archive=OFF format despite being named ib_%016.log,
try to recover it in that format. If the circular ib_logfile0 is missing,
determine the oldest archived log file with contiguous LSN.
If innodb_log_archive=ON, refuse to start if ib_logfile0 exists.
Open non-last non-preallocated archived log file in read/write mode.

recv_sys_t::find_checkpoint_archived(): Validate each checkpoint in
the current file header, and by default aim to recover from the last
valid one. Terminate the search if the last validated checkpoint
spanned two files. If innodb_log_recovery_start has been specified,
attempt to validate it even if there is no such information stored
in the checkpoint header.

log_parse_file(): Do not invoke fil_name_process() during
recv_sys_t::find_checkpoint_archived(), when we tolerate FILE_MODIFY
records while looking for a FILE_CHECKPOINT record.

recv_scan_log(): Invoke log_t::archived_switch_recovery() upon
reaching the end of the current archived log file.

log_t::archived_switch_recovery_prepare(): Make use of
recv_sys.log_archive and open all but the last file read-only.

log_t::archived_switch_recovery(): Switch files in the pread() code path.

log_t::archived_mmap_switch_recovery_complete(): Switch files in the
memory-mapped code path.

recv_warp: A pointer wrapper for memory-mapped parsing that spans two
archive log files.

recv_sys_t::parse_mmap(): Use recv_warp for innodb_log_archive=ON.

recv_sys_t::parse(): Tweak some logic for innodb_log_archive=ON.

log_t::set_recovered_checkpoint(): Set the checkpoint on recovery.
Updates also the end_lsn.

log_t::set_recovered_lsn(): Also update flush_lock and write_lock,
to ensure that log_write_up_to() will be a no-op. Invoke
unstash_archive_file() in case there was a garbage (pre-allocated)
file at the end which was not parsed at all.

log_t::persist(): Even if the flushed_to_disk_lsn does not change,
we may want to reset the write_lsn_offset.
Sergei Petrunia
Introduce TABLE::matching_rows_in_table() and use it in DupsWeedout code
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Alexey Botchkov
MDEV-15479 Empty string is erroneously allowed as a GEOMETRY column value.

Empty string disallowed.
Raghunandan Bhat
MDEV-37534: Assertion `a == &type_handler_row || a == &type_handler_null` failed on `CREATE TABLE ... ROW()`

Problem:
  The function `Type_collection_row::aggregate_for_comparison()` uses
  debug assert to check if types involved in comparison are either `ROW`
  or `NULL`. When hybrid functions like `CASE-WHEN-THEN` and `NULLIF`
  tries to compare `ROW` type with other types, assertion fails.

Fix:
  Convert assertions in `Type_collection_row::aggregate_for_comparison`
  into a condition and return `NULL` if an invalid type combination is
  detected.
Raghunandan Bhat
MDEV-37534: Assertion `a == &type_handler_row || a == &type_handler_null` failed on `CREATE TABLE ... ROW()`

Problem:
  The function `Type_collection_row::aggregate_for_comparison()` uses
  debug assert to check if types involved in comparsion are either `ROW`
  or `NULL`. When hybrid functions like `CASE-WHEN-THEN` and `NULLIF`
  tries to comapre `ROW` type with other types, assertion fails.

Fix:
  Convert assertions in `Type_collection_row::aggregate_for_comparison`
  into a condition and return `NULL` if an invalid type combination is
  detected.
Sergei Petrunia
MDEV-39368: Fix wrong behavior for empty Replay Context

If the previous command was EXPLAIN, it would use its context.
If the previous command was not an EXPLAIN, nothing would be printed.
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Sergei Petrunia
Introduce TABLE::matching_rows_in_table() and use it in DupsWeedout code
Sergei Petrunia
Introduce TABLE::matching_rows_in_table() and use it in DupsWeedout code
Raghunandan Bhat
MDEV-37534: Assertion `a == &type_handler_row || a == &type_handler_null` failed on `CREATE TABLE ... ROW()`

Problem:
  The function `Type_collection_row::aggregate_for_comparison()` uses
  debug assert to check if types involved in comparison are either `ROW`
  or `NULL`. When hybrid functions like `CASE-WHEN-THEN` and `NULLIF`
  tries to compare `ROW` type with other types, assertion fails.

Fix:
  Convert assertions in `Type_collection_row::aggregate_for_comparison`
  into a condition and return `NULL` if an invalid type combination is
  detected.
Alessandro Vetere
MDEV-39325 Fix range_set find() and add_range()

Unit test range_set.

Before this changes:

1. [64,281] + [281,282] produced [64,281] instead of [64,282]
2. [10,20],[30,40] would not contain(15)
3. [10,20],[30,40] would not contain(20)
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Alessandro Vetere
MDEV-39325 Extend pessimistic operations monitor

Extend debug-only monitoring to pessimistic inserts and deletes,
adding five new counters:

- INNODB_BTR_CUR_PESSIMISTIC_INSERT_CALLS
- INNODB_BTR_CUR_PESSIMISTIC_DELETE_CALLS
- INNODB_MTR_N_INDEX_S_LOCK_CALLS
- INNODB_MTR_N_INDEX_X_LOCK_CALLS
- INNODB_MTR_N_INDEX_SX_LOCK_CALLS

Add two include files to encapsule the monitoring operations and
reduce code duplication in index_lock_upgrade.test.

In index_lock_upgrade.test, add a second table t2 with secondary index
on datetime column, as well as same-size and decreasing-size UPDATES.
Add DELETE as well in both dense and scattered patterns, monitoring
the behavior of purge as well.
Sergei Petrunia
Followup to: Let the optimizer context have DROP TABLE|VIEW IF EXISTS.

Update test results.

(squash with 77a0b7c8).
Sergei Petrunia
From Monty: optimize_semijoin_nests(), count selectivity for materialized table size
Yuchen Pei
MDEV-39361 Assign Name resolution context in subst_vcol_if_compatible to the new vcol Item_field

The pushdown from HAVING into WHERE optimization cleans up and refixes
every condition to be pushed.

The virtual column (vcol) index substitution optimization replaces
vcol expressions in GROUP BY (and WHERE and ORDER BY) with vcol
fields.

The refixing requires the correct name resolution context to find the
vcol fields.

The commit 0316c6e4f21dee02f5adfbe5c62471ee75ca20bb assigns context
from the select_lex that the GROUP BY belongs to, but that may not
work when there are derived table subqueries.

In this commit we assign the correct context by adding a context field
to vcol_info, and assigning value to it in a Item_field constructor as
well as during substitution, and using this context for the newly
constructed vcol Item_field in the substitution.

Alternative considered:

1. Assign the context when constructing vcol_info, in
unpack_vcol_info_from_frm. This does not work because the
current_context() in parsing is not the correct context, not to
mention that unpack_vcol_info_from_frm is not always called from a
SELECT statement.

2. Get the correct context for vcol_info after its construction and
before the substitution. Debugger with watch -a vcol_info shows that
there are no common functions accessing vcol_info before the
substitution.
Sergei Petrunia
From Monty: optimize_semijoin_nests(), count selectivity for materialized table size
PranavKTiwari
Assertion thd->mdl_context.is_lock_owner(MDL_key::TABLE, share->db.str, share->table_name.str, MDL_EXCLUSIVE) failed
Alessandro Vetere
MDEV-39325 Defer purge B-tree index merges

Defer B-tree index merge operations during a purge batch processing at
the end of the batch.
This should avoid repeated failed merge attempts, decreasing pessimistic
delete fallbacks.

Optimistic delete during purge is decorated with a new
BTR_PURGE_DELETE_FLAG which allows it to proceed in more cases even if
the page would become underfull.
Pages are marked for deferred processing if they need compression after
any successful delete attempt, and removed from the deferred processing
set if a pessimistic deleted was successful and the page does not
require compression anymore.

A std::set is used to track the std::pair<dict_index_t*, page_id_t>
required for deferred processing.
This allows O(log(N)) deduplication and avoids the necessity of a
hashing function such as for std::unordered_set.

Deferred processing is executed after the purge batch is handled,
before closing the node.
Each deferred page is processed in two steps:

1. Page is peeked to get a key to enable traversal from the root
2. If page is valid, index is X-locked and traversed to get the leaf
  page which is possibly compressed

After processing, the tracking set is cleared.
Alexey Botchkov
MDEV-38767 XML datatype to be reported as format in extended metadata in protocol.

add respective Send_field metadata for UDT.
PranavKTiwari
Assertion thd->mdl_context.is_lock_owner(MDL_key::TABLE, share->db.str, share->table_name.str, MDL_EXCLUSIVE) failed
Sergei Golubchik
MDEV-39413 wsrep unsafe handling of parameters

introduce safe() wrapper for parameters that can later be interpolated
into command line.
Use as

- var="foo$BAR"
+ var="foo$(safe BAR)"

A parameter is safe, if it contains no spaces, single quotes or backticks
Sergei Petrunia
MDEV-37732: TPROC-H, Query 21 is much slower in 11.4 than in 10.11

The problem was in Duplicate_weedout_picker::check_qep(). It computes
updated record_count - the number of record combinations left after the
strategy removes the subquery's duplicates. The logic was incorrect.
Consider EXAMPLE-1:

  select * from t1 where t1.col1 in (select col2 from t2 where corr_cond)

  and the join order of
    t2, full scan rows=1M
    t1, ref access on t1.col1=t2.col2, rows=10

Here, it would compute updated record_count=10, based on #rows in t1's
access method. This number is much smaller than real estimate.

Rewrote the computation logic to use two approaches:

== 1. Use Subquery Fanout ==
Compute "Subquery Fanout" - how many (duplicate) matches the subquery
will generate for a record combination of outer tables. (Like everywhere
else in the code, we assume that any value has a match). For example, for

  ... IN (SELECT o_customer FROM orders WHERE ...)

the Subquery Fanout is average number of orders per one customer.
Then, after Duplicate Elimination is applied, we will have:
  updated_record_count = record_count / subquery_fanout.

Applying this to EXAMPLE-1, one gets:
  if (n_distinct(t2.col2) is known) then
    subquery_fanout= #rows(t2) / n_distinct(t2.col2);
  else
    subquery_fanout= 1.0;
  updated_record_count= 1M * 10 / subquery_fanout;

== 2. Collect fanout of outer tables in the join prefix ==

Done as follows:
  outer_fanout=1.0;
  for each table T in join prefix {
    if (T is not from subquery) {
      // This table's fanout will not be removed
      if (access to T doesn't depend on subquery tables)
        outer_fanout *= T->records_out;
      else
        outer_fanout *= T->table->stat_records() *
                        T->table->cond_selectivity;
    }
  }
  updated_record_count= outer_fanout;

The formula "stat_records()*cond_selectivity" estimates the fanout
that table T would have if it used an "independent" access method.

When we apply this to the join order of EXAMPLE-1:
  t2, full scan rows=1M  -- subquery table, ignore
  t1, ref access on t1.col1=t2.col2, rows=10 -- outer table, but the
access depends on the subquery table, so use
  t1->stat_records() * t1->cond_selectivity.

== Putting it together ==
Both approaches can give poor estimates in different cases, so we pick
the one providing smaller estimate.

The fix is controlled by @@new_mode='FIX_SEMIJOIN_DUPS_WEEDOUT_CHECK'
and is OFF by default in this patch.
Raghunandan Bhat
MDEV-36763: Assertion failed in `Type_handler_json_common::json_type_handler_from_generic`

Problem:
  Queries returning JSON from hybrid functions (`NULLIF`, `IF`, `NVL2`)
  crash. This is a regression from MDEV-36716, which added early calls
  to `Item_hybrid_func_fix_attributes()` inside
  - `Item_func_nullif::fix_length_and_dec()` and
  - `Item_func_case_abbreviation2::cache_type_info()`
  to block `ROW` arguments.

  Standard hybrid functions populate their `m_type_handler` by calling
  `aggregate_for_result()` prior to fixing attributes. However, since
  `NULLIF` and `IF` deduce their return type from a single argument,
  they optimize away the `aggregate_for_result()` step.

  This caused `Item_hybrid_func_fix_attributes()` to execute while
  `m_type_handler` was still in its uninitialized default state,
  triggering a debug assertion failure in the JSON handler.

Fix:
  Explicitly set the type handler from the source argument before
  calling `Item_hybrid_func_fix_attributes()`.
Sergei Petrunia
MDEV-37732: TPROC-H, Query 21 is much slower in 11.4 than in 10.11

The problem was in Duplicate_weedout_picker::check_qep(). It computes
updated record_count - the number of record combinations left after the
strategy removes the subquery's duplicates. The logic was incorrect.
Consider EXAMPLE-1:

  select * from t1 where t1.col1 in (select col2 from t2 where corr_cond)

  and the join order of
    t2, full scan rows=1M
    t1, ref access on t1.col1=t2.col2, rows=10

Here, it would compute updated record_count=10, based on #rows in t1's
access method. This number is much smaller than real estimate.

Rewrote the computation logic to use two approaches:

== 1. Use Subquery Fanout ==
Compute "Subquery Fanout" - how many (duplicate) matches the subquery
will generate for a record combination of outer tables. (Like everywhere
else in the code, we assume that any value has a match). For example, for

  ... IN (SELECT o_customer FROM orders WHERE ...)

the Subquery Fanout is average number of orders per one customer.
Then, after Duplicate Elimination is applied, we will have:
  updated_record_count = record_count / subquery_fanout.

Applying this to EXAMPLE-1, one gets:
  if (n_distinct(t2.col2) is known) then
    subquery_fanout= #rows(t2) / n_distinct(t2.col2);
  else
    subquery_fanout= 1.0;
  updated_record_count= 1M * 10 / subquery_fanout;

== 2. Collect fanout of outer tables in the join prefix ==

Done as follows:
  outer_fanout=1.0;
  for each table T in join prefix {
    if (T is not from subquery) {
      // This table's fanout will not be removed
      if (access to T doesn't depend on subquery tables)
        outer_fanout *= T->records_out;
      else
        outer_fanout *= T->table->stat_records() *
                        T->table->cond_selectivity;
    }
  }
  updated_record_count= outer_fanout;

The formula "stat_records()*cond_selectivity" estimates the fanout
that table T would have if it used an "independent" access method.

When we apply this to the join order of EXAMPLE-1:
  t2, full scan rows=1M  -- subquery table, ignore
  t1, ref access on t1.col1=t2.col2, rows=10 -- outer table, but the
access depends on the subquery table, so use
  t1->stat_records() * t1->cond_selectivity.

== Putting it together ==
Both approaches can give poor estimates in different cases, so we pick
the one providing smaller estimate.

The fix is controlled by @@new_mode='FIX_SEMIJOIN_DUPS_WEEDOUT_CHECK'
and is OFF by default in this patch.
jmestwa-coder
Document invariant ensuring passwd stays within packet bounds

Document that the packet buffer is null-terminated by the network layer,
ensuring strend(user) remains within bounds and passwd stays within
the packet.
Sergei Golubchik
MDEV-39408 mbstream insufficient path validation

reject paths that could not have been written by mariadb-backup
Marko Mäkelä
MDEV-37949: Implement innodb_log_archive

The InnoDB write-ahead log file in the old innodb_log_archive=OFF
format is named ib_logfile0, pre-allocated to innodb_log_file_size and
written as a ring buffer. This is good for write performance and space
management, but unsuitable for arbitrary point-in-time recovery or for
facilitating efficient incremental backup.

innodb_log_archive=ON: A new format where InnoDB will create and
preallocate files ib_%016x.log, instead of writing a circular file
ib_logfile0. Each file will be pre-allocated to innodb_log_file_size.
Once a log fills up, we will create and pre-allocate another log file,
to which log records will be written.  Upon the completion of the first
log checkpoint in a recently created log file, the old log file will
be marked read-only, signaling that there will be no further writes to
that file, and that the file may safely be moved to long-term storage.

The file name includes the log sequence number (LSN) at file offset
12288 (log_t::START_OFFSET). Each checkpoint is identified by storing
a 64-bit big-endian offset into an optional sequence of FILE_MODIFY
records followed by a FILE_CHECKPOINT record, between 12288 and the
end of the file.

The innodb_encrypt_log format is identified by storing the encryption
information at the start of the log file. The first 64-bit value will
be 1, which is an invalid checkpoint offset. Each
innodb_log_archive=ON log must use the same encryption parameters.
Changing innodb_encrypt_log or related parameters is only possible by
setting innodb_log_archive=OFF and restarting the server, which will
permanently lose the history of the archived log.

The maximum number of log checkpoints that the innodb_log_archive=ON
file header can represent is limited to 12288/8=1536 when using
innodb_encrypt_log=OFF. If we run out of slots in a log file, each
subsequently completed checkpoint in that log file will overwrite the
last slot in the checkpoint header, until we switch to the next log.

innodb_log_recovery_start: The checkpoint LSN to start recovery from.
This will be useful when recovering from an archived log. This is useful
for restoring an incremental backup (applying InnoDB log files that were
copied since the previous restore).

innodb_log_recovery_target: The requested LSN to end recovery at.
When this is set, all persistent InnoDB tables will be read-only, and
no writes to the log are allowed. The intended purpose of this setting
is to prepare an incremental backup, as well as to allow data
retrieval as of a particular logical point of time.

Setting innodb_log_recovery_target>0 is much like setting
innodb_read_only=ON, with the exception that the data files may be
written to by crash recovery, and locking reads will conflict with any
incomplete transactions as necessary, and all transaction isolation
levels will work normally (not hard-wired to READ UNCOMMITTED).

srv_read_only_mode: When this is set (innodb_read_only=ON), also
recv_sys.rpo (innodb_log_recovery_target) will be set to the current LSN.
This ensures that it will suffice to check only one of these variables
when blocking writes to persistent tables.

The status variable innodb_lsn_archived will reflect the LSN
since when a complete InnoDB log archive is available. Its initial
value will be that of the new parameter innodb_log_archive_start.
If that variable is 0 (the default), the innodb_lsn_archived will
be recovered from the available log files. If innodb_log_archive=OFF,
innodb_lsn_archived will be adjusted to the latest checkpoint every
time a log checkpoint is executed. If innodb_log_archive=ON, the value
should not change.

SET GLOBAL innodb_log_archive=!@@GLOBAL.innodb_log_archive will take
effect as soon as possible, possibly after a log checkpoint has been
completed. The log file will be renamed between ib_logfile0 and
ib_%016x.log as appropriate.

When innodb_log_archive=ON, the setting SET GLOBAL innodb_log_file_size
will affect subsequently created log files when the file that is being
currently written is running out. If we are switching log files exactly
at the same time, then a somewhat misleading error message
"innodb_log_file_size change is already in progress" will be issued.

no_checkpoint_prepare.inc: A new file, to prepare for subsequent
inclusion of no_checkpoint_end.inc. We will invoke the server to
parse the log and to determine the latest checkpoint.

All --suite=encryption tests that use innodb_encrypt_log
will be skipped for innodb_log_archive=ON, because enabling
or disabling encryption on the log is not possible without
temporarily setting innodb_log_archive=OFF and restarting
the server. The idea is to add the following arguments to an
invocation of mysql-test/mtr:

--mysqld=--loose-innodb-log-archive \
--mysqld=--loose-innodb-log-recovery-start=12288 \
--mysqld=--loose-innodb-log-file-mmap=OFF \
--skip-test=mariabackup

Alternatively, specify --mysqld=--loose-innodb-log-file-mmap=ON
to cover both code paths.

The mariabackup test suite must be skipped when using the
innodb_log_archive=ON format, because mariadb-backup will only
support the old ib_logfile0 format (innodb_log_archive=OFF).

A number of tests would fail when the parameter
innodb_log_recovery_start=12288 is present, which is forcing
recovery to start from the beginning of the history
(the database creation). The affected tests have been adjusted with
explicit --innodb-log-recovery-start=0 to override that:

(0) Some injected corruption may be "healed" by replaying the log
from the beginning. Some tests expect an empty buffer pool after
a restart, with no page I/O due to crash recovery.
(1) Any test that sets innodb_read_only=ON would fail with an error
message that the setting prevents crash recovery, unless
innodb_log_recovery_start=0.
(2) Any test that changes innodb_undo_tablespaces would fail in crash
recovery, because crash recovery assumes that the undo tablespace ID
that is available from the undo* files corresponds with the start of
the log. This is an unforunate design bug which we cannot fix easily.

log_sys.first_lsn: The start of the current log file, to be consulted
in log_t::write_checkpoint() when renaming files.

log_sys.archived_lsn: New field: The value of innodb_lsn_archived.

log_sys.end_lsn: New field: The log_sys.get_lsn() when the latest
checkpoint was initiated. That is, the start LSN of a possibly empty
sequence of FILE_MODIFY records followed by FILE_CHECKPOINT.

log_sys.resize_target: The value of innodb_log_file_size that will be
used for creating the next archive log file once the current file (of
log_sys.file_size) fills up.

log_sys.archive: New field: The value of innodb_log_archive.

log_sys.next_checkpoint_no: Widen to uint16_t. There may be up to
12288/8=1536 checkpoints in the header.

log_sys.log: If innodb_log_archive=ON, this file handle will be kept
open also in the PMEM code path.

log_sys.resize_log: If innodb_log_archive=ON, we may have two log
files open both during normal operation and when parsing the log. This
will store the other handle (old or new file).

log_sys.resize_buf: In the memory-mapped code path, this will point
to the file resize_log when innodb_log_archive=ON.

recv_sys.log_archive: All innodb_log_archive=ON files that will be
considered in recovery.

recv_sys.was_archive: A flag indicating that an innodb_log_archive=ON
file is in innodb_log_archive=OFF format.

log_sys.is_pmem, log_t::is_mmap_writeable(): A new predicate.
If is_mmap_writeable(), we assert and guarantee buf_size == capacity().

log_t::archive_new_write(): Create and allocate a new log file, and
write the outstanding data to both the current and the new file, or
only to the new file, until write_checkpoint() completes the first
checkpoint in the new file.

log_t::archived_mmap_switch_prepare(): Create and memory-map a new log
file, and update file_size to resize_target. Remember the file handle
of the current log in resize_log, so that write_checkpoint() will be
able to make it read-only.

log_t::archived_mmap_switch_complete(): Switch to the buffer that was
created in archived_mmap_switch_prepare().

log_t::archive_create(bool): Create and allocate an archive log file.

log_t::write_checkpoint(): Allow an old checkpoint to be completed in
the old log file even after a new one has been created. If we are
writing the first checkpoint in a new log file, we will mark the old
log file read-only. We will also update log_sys.first_lsn unless it
was already updated in ARCHIVED_MMAP code path. In that code path,
there is the special case where log_sys.resize_buf == nullptr and
log_sys.checkpoint_buf points to log_sys.resize_log (the old log file
that is about to be made read-only). In this case, log_sys.first_lsn
will already point to the start of the current log_sys.log, even
though the switch has not been fully completed yet.
Try to preallocate the next archive file if needed, with the goal that
when we need the file it will already be ready for use.

log_t::header_rewrite(my_bool): Rewrite the log file header before or
after renaming the log file, and write a message about the change,
so that there will be a chance to recover in case the server is being
killed during this operation.  The recovery of the last ib_%016%.log
does tolerate also the ib_logfile0 format.

log_t::set_archive(my_bool,THD): Implement SET GLOBAL innodb_log_archive.
An error will be returned if non-archived SET GLOBAL innodb_log_file_size
(log file resizing) is in progress. Wait for checkpoint if necessary.
The current log file will be renamed to either ib_logfile0 or
ib_%016x.log, as appropriate. In SET GLOBAL innodb_log_archive=OFF,
we trigger a write-ahead of the log if necessary, to prevent overrun.

log_t::archive_rename(): Rename an archived log to ib_logfile0 on recovery
in case there had been a crash during set_archive().

log_t::archive_set_size(): A new function, to ensure that
log_sys.resize_target is set on startup.

log_checkpoint_low(): Do not prevent a checkpoint at the start of a file.
We want the first innodb_log_archive=ON file to start with a checkpoint.

log_t::create(lsn_t): Initialize last_checkpoint_lsn. Initialize the
log header as specified by log_sys.archive (innodb_log_archive).

log_write_buf(): Add the parameter max_length, the file wrap limit.

log_write_up_to(), mtr_t::commit_log_release<bool mmap=true>():
If we are switching log files, invoke buf_flush_ahead(lsn, true)
to ensure that a log checkpoint will be completed in the new file.

mtr_t::finish_writer(): Specialize for innodb_log_archive=ON.

mtr_t::commit_file(): Ensure that log archive rotation will complete.

log_t::append_prepare<log_t::ARCHIVED_MMAP>(): Special case.

log_t::get_path(): Get the name of the current log file.

log_t::get_circular_path(size_t): Get the path name of a circular file.
Replaces get_log_file_path().

log_t::get_archive_path(lsn_t): Return a name of an archived log file.

log_t::get_next_archive_path(): Return the name of the next archived log.

log_t::append_archive_name(): Append the archive log file name
to a path string.

mtr_t::finish_writer(): Invoke log_close() only if innodb_log_archive=OFF.
In the innodb_log_archive=ON, we only force log checkpoints after creating
a new archive file, to ensure that the first checkpoint will be written
as soon as possible.

log_t::checkpoint_margin(): Replaces log_checkpoint_margin().
If a new archived log file has been created, wait for the
first checkpoint in that file.

srv_log_rebuild_if_needed(): Never rebuild if innodb_log_archive=ON.
The setting innodb_log_file_size will affect the creation of
subsequent log files. The parameter innodb_encrypt_log cannot be
changed while the log is in the innodb_log_archive=ON format.

log_t::attach(), log_mmap(): Add the parameter log_access,
to distinguish memory-mapped or read-only access.

log_t::attach(): When disabling innodb_log_file_mmap, read
checkpoint_buf from the last innodb_log_archive=ON file.

log_t::clear_mmap(): Clear the tail of the checkpoint buffer
if is_mmap_writeable().

log_t::set_recovered(): Invoke clear_mmap(), and restore the
log buffer to the correct position.

recv_sys_t::apply(): Let log_t::clear_mmap() enable log writes.

log_file_is_zero(): Check if a log file starts with NUL bytes
(is a preallocated file).

recv_sys_t::find_checkpoint(): Find and remember the checkpoint position
in the last file when innodb_log_recovery_start points to an older file.
When innodb_log_file_mmap=OFF, restore log_sys.checkpoint_buf from
the latest log file. If the last archive log file is actually
in innodb_log_archive=OFF format despite being named ib_%016.log,
try to recover it in that format. If the circular ib_logfile0 is missing,
determine the oldest archived log file with contiguous LSN.
If innodb_log_archive=ON, refuse to start if ib_logfile0 exists.
Open non-last non-preallocated archived log file in read/write mode.

recv_sys_t::find_checkpoint_archived(): Validate each checkpoint in
the current file header, and by default aim to recover from the last
valid one. Terminate the search if the last validated checkpoint
spanned two files. If innodb_log_recovery_start has been specified,
attempt to validate it even if there is no such information stored
in the checkpoint header.

log_parse_file(): Do not invoke fil_name_process() during
recv_sys_t::find_checkpoint_archived(), when we tolerate FILE_MODIFY
records while looking for a FILE_CHECKPOINT record.

recv_scan_log(): Invoke log_t::archived_switch_recovery() upon
reaching the end of the current archived log file.

log_t::archived_switch_recovery_prepare(): Make use of
recv_sys.log_archive and open all but the last file read-only.

log_t::archived_switch_recovery(): Switch files in the pread() code path.

log_t::archived_mmap_switch_recovery_complete(): Switch files in the
memory-mapped code path.

recv_warp: A pointer wrapper for memory-mapped parsing that spans two
archive log files.

recv_sys_t::parse_mmap(): Use recv_warp for innodb_log_archive=ON.

recv_sys_t::parse(): Tweak some logic for innodb_log_archive=ON.

log_t::set_recovered_checkpoint(): Set the checkpoint on recovery.
Updates also the end_lsn.

log_t::set_recovered_lsn(): Also update flush_lock and write_lock,
to ensure that log_write_up_to() will be a no-op. Invoke
unstash_archive_file() in case there was a garbage (pre-allocated)
file at the end which was not parsed at all.

log_t::persist(): Even if the flushed_to_disk_lsn does not change,
we may want to reset the write_lsn_offset.
Sergei Golubchik
cleanup: remove galera test certificates from mtr

use server certs instead. galera CA didn't have a key saved in std_data,
so they could not be regenerated when needed.
Sergei Petrunia
MDEV-39368: When replay query fails, print the original query location in test file
Sergei Petrunia
In optimize_semijoin_nests(), count selectivity in materialized table size

In optimize_semijoin_nests(), properly count table condition selectivity
when estimating the size of materialized table.

This is preparation for fix for MDEV-37732.

Author: Monty Widenius
bsrikanth-mariadb
MDEV-39414: cleanup optimizer context tests and docs

1. included file_stat_records in the schema, and dbug_print_read_stats()
2. Removed "optimizer_trace=ON" from the tests, as it has no impact on the
tests.
3. Added a test to check the presence of file_stat_records, and produce an error if not present.
Sergei Petrunia
MDEV-37732: TPROC-H, Query 21 is much slower in 11.4 than in 10.11

The problem was in Duplicate_weedout_picker::check_qep(). It computes
"outer fanout" - number of record combinations left after the strategy
removes the subquery's duplicates. The logic was incorrect. For example:

  select * from t1 where t1.col1 in (select col2 from t2 where corr_cond)

  and the join order of
    t2, full scan rows=1M
    t1, ref access on t1.col1=t2.col2, rows=10

(call this EXAMPLE-1) it would conclude that the number of record
combinations after Duplicate Weedout (call this "outer_fanout") is 10,
based on t1.access_method_rows=10.

Rewrote the computation logic to use two approaches:

1. Pre-compute subquery's output cardinality, then check how many
  duplicate rows are expected in its output. This gives "subquery
  fanout" which we can use as expected number of duplicates.
  For example, for
    ... IN (SELECT o_customer FROM orders WHERE ...)
  the subquery fanout is average number of orders per one customer.
  In EXAMPLE-1, outer_fanout= (1M * 10) / subquery_fanout,
    subquery_fanout=#rows(t2) / n_distinct(t2.col2)
  When n_distinct is not known, subquery_fanout=1.

2. Given a join prefix of inner and outer tables, compute outer tables'
  fanout by ignoring all inner tables.
  If access to an outer table depends on an inner table, use outer
  table's "found records".
  For example, for
    select * from small_table WHERE col IN (SELECT * FROM big_table)
  and a join order of
    big_table, small_table
  we will use the number of matching records in the small_table
  (small_table->stat_records() * small_table->cond_selectivity as an
  estimate of how many matches there can be.
  In EXAMPLE-1: outer_fanout = #records(t1).

Both approaches can give poor estimates in different cases, so we pick
the best one.

The fix is controlled by @@new_mode='FIX_SEMIJOIN_DUPS_WEEDOUT_CHECK'
and is OFF by default in this patch.