Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Categories: connectors experimental galera main
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

connectors experimental galera main
Vladislav Vaintroub
Use the server's MariaDB::zlib target when built as a submodule

When Connector/C is built inside the MariaDB server tree, pick up the server's
zlib via the MariaDB::zlib target instead of running FIND_PACKAGE(ZLIB), so the
connector uses exactly the same zlib (bundled or system) as the server. This
also avoids depending on the FindZLIB result-variable signature. Standalone
builds are unchanged - they still fall back to FIND_PACKAGE(ZLIB).

Server: MDEV-37996
rusher
Add clang -Werror build job to CI
Georg Richter
Merge branch '3.3' into 3.4
  • cc-x-codbc-windows: 'dojob pwd if '3.4' == '3.4' ls win32/test SET TEST_DSN=master SET TEST_DRIVER=master SET TEST_PORT=3306 SET TEST_SCHEMA=odbcmaster if '3.4' == '3.4' cd win32/test if '3.4' == '3.4' ctest --output-on-failure' failed -  stdio
forkfun
MDEV-23444 ASAN dynamic-stack-buffer-overflow or Assertion `precision > 0'
failed in decimal_bin_size with div_precision_increment=0

A signed numeric value of display length 1 (WEEKDAY(), DAYOFWEEK(),
@v:=<int>) has decimal_precision() == 0. When such a value becomes a
DECIMAL - through division, AVG() or a UNION column - the result has
precision 0, and the assertion `precision > 0' in decimal_bin_size()
fails when the value is stored into a field, a GROUP BY key or a
filesort key.

Fix the functions that build these DECIMAL results to use precision 1
instead of 0: Item_func_div::result_precision,
Item_sum_avg::fix_length_and_dec_decimal and
Type_handler_newdecimal::make_table_field.
Vladislav Vaintroub
MDEV-37996 update libmariadb to use server's MariaDB::zlib

Remove hack in cmake/mariadb_connector_c.cmake that sets variables
to make connectors FIND_PACKAGE(ZLIB) use server's zlib.
it is not necessary anymore, and is expressed more directly.
bsrikanth-mariadb
MDEV-39942: use actual_rec_per_key in loose scan sj optimization

While performing loose scan semi-join optimization in
Loose_scan_opt::check_ref_access_part1() [opt_subselect.h],
the filtered value in the explain plan was in-consistent for a table
with index when same data was inserted using "insert into t1 values()...",
and "insert into t1 select * from t2". This is noticed when
rec_per_key[] was used to get the number of records during optimization.

However, when actual_rec_per_key() is used to get the number of records,
we don't notice this discrepancy.

This PR changes the rec_per_key[] usage to actual_rec_per_key() in the
method Loose_scan_opt::check_ref_access_part1() [opt_subselect.h]
forkfun
MDEV-39380 Assertion `arg2_int >= 0' failed in Item_func_additive_op::result_precision

COALESCE/IFNULL/CASE over a hex/bit literal went through string attribute
aggregation, which reset the hybrid's integer attributes to string defaults
(decimals=NOT_FIXED_DEC, unsigned_flag=false). Functions relying on the
integer nature then broke: ROUND()/TRUNCATE() tripped the
`args[0]->decimals == 0' / `unsigned_flag' asserts, and SUM() built a
DECIMAL with scale > precision.

Restore decimals=0 and unsigned_flag in
Type_handler_hex_hybrid::Item_hybrid_func_fix_attributes when the
aggregated result is still a hex hybrid.
forkfun
MDEV-39380 Assertion `arg2_int >= 0' failed in Item_func_additive_op::result_precision

COALESCE/IFNULL/CASE over a hex/bit literal went through string attribute
aggregation, which reset the hybrid's integer attributes to string defaults
(decimals=NOT_FIXED_DEC, unsigned_flag=false). Functions relying on the
integer nature then broke: ROUND()/TRUNCATE() tripped the
`args[0]->decimals == 0' / `unsigned_flag' asserts, and SUM() built a
DECIMAL with scale > precision.

Restore decimals=0 and unsigned_flag in
Type_handler_hex_hybrid::Item_hybrid_func_fix_attributes when the
aggregated result is still a hex hybrid.
Add val_int/val_real/val_decimal to Type_handler_hex_hybrid so
these functions return the same value as the bare literal in numeric context
Vladislav Vaintroub
MDEV-37996 CMake: MariaDB:: targets for bundled-or-system libraries

Stop overwriting the standard find_package() result variables (ZLIB_FOUND,
ZLIB_LIBRARIES, ZLIB_INCLUDE_DIR(S), ...), this breaks vcpkg.

Provide namespaced INTERFACE targets that point at either the bundled or
the system library and carry their include directories (and, for SSL, the
compile definitions):

  MariaDB::zlib, MariaDB::OpenSSL, MariaDB::pcre2-8, MariaDB::pcre2-posix,
  MariaDB::fmt, MariaDB::readline

Link these consistently instead of the scattered ${*_LIBRARIES} and
${*_INCLUDE_DIR(S)} variables sprinkled across the tree.
Alessandro Vetere
MDEV-40218 lock_rec_unlock_unmodified<CELL>() can release a stale per-cell latch

On a secondary index, lock_rec_unlock_unmodified<CELL>() drops the cell
latch and lock_sys.latch before calling lock_sec_rec_some_has_impl(), then
re-acquires lock_sys.latch in shared mode, recomputes the cell address and
latches the newly computed cell.  A concurrent lock_sys_t::hash_table::resize()
(rec_hash grows with the buffer pool) during that window reallocates the cell
array, so the cell and its latch move.  On success the function returned true
while holding the latch of the new cell, but lock_release_on_prepare_try() then
released the latch variable it had computed from the old cell address, which
could be stale.

Fix: make the cell parameter of lock_rec_unlock_unmodified() an in/out
reference so the function reports the cell it currently holds, and have the
CELL caller release that cell's latch.  This reuses the cell address the
function already recomputed, avoiding a second rec_hash lookup.
Yuchen Pei
MDEV-40048 [to-squash] Allow trigger and LOCK TABLES to work with range interval auto partitioning

When a range interval auto partitioned table is the target of a
trigger, the triggering statement is not necessarily one that would
cause the auto-creation of new partitions, so we need to account for
that.

Also added support for LOCK TABLES ... WRITE.

Improved tests coverage by adapting tests from versioning.partition.
Marko Mäkelä
MDEV-14992 BACKUP SERVER

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ];
BACKUP SERVER WITH [ 1 CONCURRENT ] 'command';

In place of the 1, any positive number of threads may be specified.
For the first variant, '/path/to' must exist and '/path/to/directory'
must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or
command that will be executed in a child process. The standard input
of that command will be in a format that is compatible with
GNU tar --format=oldgnu (and also BSD tar variants that are also part of
Microsoft Windows and Apple macOS). The command is expected to optionally
compress and encrypt the stream and redirect it to a file on a local or
a remote server. The BACKUP SERVER WITH will append an additional argument,
a positive base-ten number in ASCII, starting with 1, to identify the
current thread. In this way, each concurrent stream can write a separate
file.

The backup or the first stream will contain a file backup.cnf, which
includes parameters needed for restoring the backup. Currently,
these are innodb_log_recovery_start and innodb_log_recovery_target.
If innodb_log_recovery_target>0, InnoDB will be in read-only mode,
not allowing any writes to persistent files other than via the log
application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty
directory needs to be created and all streams be extracted there using
the standard tar utility of the operating system, optionally after
undoing any encryption or compression that had been added by the
backup command. Then, the backup is prepared or MariaDB server started
up on the extracted directory, similar to as if the BACKUP SERVER TO
statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is
STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery
starts from the latest available log checkpoint. However, for restoring
a backup, recovery must start from the checkpoint that was the latest
when the backup was started. Starting recovery from a possible later
checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup,
storage engine interfaces, and basic copying of the storage engines
InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to
a stream. On Linux, this uses sendfile(2), which assumes that the
source data will not be changed before the data has been consumed
from the pipe.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

InnoDB_backup::queue: Collection of tablespace IDs and payload sizes
at the start of the backup. If any file is created or extended while
the backup is executing, we must have the corresponding write-ahead-log
entries that we are copying since the latest checkpoint that was
completed when the backup started. If any tablespaces are deleted
during the backup, we may or may not copy them, and the application
of a FILE_DELETE record will remove them. Similarly, FILE_RENAME
or FILE_CREATE records will take care of renaming or creating files
during recovery (applying the backed-up log).

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed
to be known. To speed up startup, InnoDB does not normally open
all tablespace files.
Sergei Petrunia
MDEV-39942: use actual_rec_per_key in loose scan sj optimization

Addition: don't cast rec_per_key value to ulong.
EITS values can be fractional as they are computed as
n_rows / n_distinct_rows.
Alessandro Vetere
MDEV-40210 Redundant CAS in async_flush_lsn::try_clear_if_at_most()

MDEV-39600 added try_clear_if_at_most() to clear buf_flush_async_lsn with
an atomic CAS that preserves a concurrent bump(). If the snapshot is
already 0, compare_exchange_strong(0, 0) is a no-op, so return early on a
zero snapshot and avoid the atomic read-modify-write. The page cleaner
calls this on every pass, so in the common steady state (no async flush
queued) it drops needless exclusive access to the m_lsn cache line. A zero
value is already the cleared state and a concurrent bump() is preserved
either way, so the result is identical.
Alessandro Vetere
MDEV-40209 Escalate lock-release via a saturating stall counter

lock_release() and lock_release_on_prepare() release a committing or preparing
transaction's explicit locks under the shared lock_sys.rd_lock(), taking each
per-cell hash latch and per-table lock mutex with a trylock because trx->mutex
is held in the reverse of the normal latch order. A single failed trylock
marked the whole pass unsuccessful, and after a fixed cap of 5 such passes the
code escalated to the exclusive lock_sys.wr_lock() for the whole transaction.
Under concurrency the trylocks fail transiently, so the cap escalated
transactions that were still steadily releasing locks, not just stuck ones; the
exclusive latch then blocks every concurrent lock_sys.rd_lock() acquirer in
lock_rec_lock() and lock_table(), producing a convoy. The chance of hitting the
cap rises with both the contention level and the number of latches a
transaction must trylock per pass.

Replace the fixed cap with a saturating stall counter (LOCK_RELEASE_MAX_STALLS,
incremented on a no-progress pass, decremented on progress, floored at zero)
that escalates a genuinely stuck transaction after 5 net stalls, as the fixed
cap did, while leaving a transaction that keeps making progress to finish under
the shared latch. A hard LOCK_RELEASE_MAX_PASSES ceiling bounds the loop
independently, for the case where concurrent activity keeps adding locks (e.g.
implicit-to-explicit conversion during XA PREPARE) so that progress never
converges. The _try functions report progress through an out-parameter computed
under trx->mutex, so trx->lock.trx_locks is never read unlatched.
Daniel Black
MDEV-40176: field_charset()->charpos(blob..) called with NULL

Problem:
  When `my_charpos_mb()` is called by `Field_blob::get_key_image_itRAW`
  with a start/end both being NULL. Because of this the blob_length
  must have been 0.

Fix:
  Rather than relying on character set functions to calculate the
  storage of nothing, bypass the calculation as the charpos() isn't
  going to return a value less than 0 (the blob_length).
Alessandro Vetere
MDEV-40210 Redundant CAS in async_flush_lsn::try_clear_if_at_most()

MDEV-39600 added try_clear_if_at_most() to clear buf_flush_async_lsn with
an atomic CAS that preserves a concurrent bump(). If the snapshot is
already 0, compare_exchange_strong(0, 0) is a no-op, so return early on a
zero snapshot and avoid the atomic read-modify-write. The page cleaner
calls this on every pass, so in the common steady state (no async flush
queued) it drops needless exclusive access to the m_lsn cache line. A zero
value is already the cleared state and a concurrent bump() is preserved
either way, so the result is identical.
Alessandro Vetere
MDEV-40209 Escalate lock-release via a saturating stall counter

lock_release() and lock_release_on_prepare() release a committing or preparing
transaction's explicit locks under the shared lock_sys.rd_lock(), taking each
per-cell hash latch and per-table lock mutex with a trylock because trx->mutex
is held in the reverse of the normal latch order. A single failed trylock
marked the whole pass unsuccessful, and after a fixed cap of 5 such passes the
code escalated to the exclusive lock_sys.wr_lock() for the whole transaction.
Under concurrency the trylocks fail transiently, so the cap escalated
transactions that were still steadily releasing locks, not just stuck ones; the
exclusive latch then blocks every concurrent lock_sys.rd_lock() acquirer in
lock_rec_lock() and lock_table(), producing a convoy. The chance of hitting the
cap rises with both the contention level and the number of latches a
transaction must trylock per pass.

Replace the fixed cap with a saturating stall counter (LOCK_RELEASE_MAX_STALLS,
incremented on a no-progress pass, decremented on progress, floored at zero)
that escalates a genuinely stuck transaction after 5 net stalls, as the fixed
cap did, while leaving a transaction that keeps making progress to finish under
the shared latch. A hard LOCK_RELEASE_MAX_PASSES ceiling bounds the loop
independently, for the case where concurrent activity keeps adding locks (e.g.
implicit-to-explicit conversion during XA PREPARE) so that progress never
converges. The _try functions report progress through an out-parameter computed
under trx->mutex, so trx->lock.trx_locks is never read unlatched.
Dave Gosselin
MDEV-38210: Unary negation of LONGTEXT, wrong result under GROUP BY

Unary negation of a LONGTEXT or LONGBLOB value returned the wrong
result under GROUP BY.  The length of the result was set to the
argument length plus one for the sign, but for these two types the
argument length is already the largest value the length field can
hold, so adding one wrapped it back to zero.  A zero length result
loses its value when it is stored in the temporary table that GROUP BY
builds, so the query returned an empty value instead of the expected
number.  The argument length is now limited before the sign character
is added, so it can no longer wrap to zero.
Alessandro Vetere
MDEV-40129 Retry transient trylock failures in lock-release fast paths

The trylock attempts on per-cell lock_sys_t::hash_latch (try_acquire())
and on per-table dict_table_t::lock_mutex_trylock() inside
lock_release_try(), lock_release_on_prepare_try() and
lock_rec_unlock_unmodified() now use a bounded spin loop
(up to LOCK_RELEASE_TRY_SPIN_BUDGET CAS attempts, with MY_RELAX_CPU()
between them) instead of a single CAS attempt.

These paths hold trx->mutex while attempting the trylock, which is the
reverse of the standard order used by lock_rec_convert_impl_to_expl().
Blocking acquisition is therefore unsafe, hence the trylock pattern.
However, a single failed CAS marks the entire pass of lock_release_try()
as unsuccessful, and after 5 such failed passes lock_release() falls
back to exclusive lock_sys.wr_lock() for the whole transaction. That
global wr_lock then blocks every concurrent lock_sys.rd_lock() acquirer
in lock_rec_lock() and lock_table(), producing a server-wide convoy
under heavy concurrency.

The bounded spin (no syscall, no blocking) gives a transient latch
holder time to release without weakening the deadlock-avoidance
guarantee that motivated the trylock pattern. The extra trx->mutex hold
time is bounded by LOCK_RELEASE_TRY_SPIN_BUDGET times the pause cost.

This is a first, still to be fine-tuned implementation. Only the
lock_release_try() path has been positively tested; the
lock_release_on_prepare_try() path is not yet covered.
sjaakola
MDEV-38243 Write binlog row events for changes done by cascading FK operations

This commit implements a feature which changes the handling of cascading foreign
key operations to write the changes of cascading operations into binlog.
The applying of such transaction, in the slave node, will apply just the binlog
events, and does not execute the actual foreign key cascade operation.
This will simplify the slave side replication applying and make it more predictable
in terms of potential interference with other parallel applying happning
in the node.

This feature can be turned ON/OFF by new variable:
rpl_use_binlog_events_for_fk_cascade, with default value OFF

The actual implementation is largely by windsurf.

The commit has also mtr tests for testing rpl_use_binlog_events_for_fk_cascade
feature:  rpl.rpl_fk_cascade_binlog_row, rpl.rpl_fk_set_null_binlog_row and
rpl.fk_cascade_binlog_row_rollback
forkfun
MDEV-39380 Assertion `arg2_int >= 0' failed in Item_func_additive_op::result_precision

COALESCE/IFNULL/CASE over a hex/bit literal went through string attribute
aggregation, which reset the hybrid's integer attributes to string defaults
(decimals=NOT_FIXED_DEC, unsigned_flag=false). Functions relying on the
integer nature then broke: ROUND()/TRUNCATE() tripped the
`args[0]->decimals == 0' / `unsigned_flag' asserts, and SUM() built a
DECIMAL with scale > precision.

Restore decimals=0 and unsigned_flag in
Type_handler_hex_hybrid::Item_hybrid_func_fix_attributes when the
aggregated result is still a hex hybrid.
Add val_int/val_real/val_decimal to Type_handler_hex_hybrid so
these functions return the same value as the bare literal in numeric context
bsrikanth-mariadb
MDEV-39942: use actual_rec_per_key in loose scan sj optimization

While performing loose scan semi-join optimization in
Loose_scan_opt::check_ref_access_part1() [opt_subselect.h],
the filtered value in the explain plan was in-consistent for a table
with index when same data was inserted using "insert into t1 values()...",
and "insert into t1 select * from t2". This is noticed when
rec_per_key[] was used to get the number of records during optimization.

However, when actual_rec_per_key() is used to get the number of records,
we don't notice this discrepancy.

This PR changes the rec_per_key[] usage to actual_rec_per_key() in the
method Loose_scan_opt::check_ref_access_part1() [opt_subselect.h]
Thirunarayanan Balathandayuthapani
MDEV-40085  TRUNCATE of temporary table with ENCRYPTED=NO crashes
under innodb_encrypt_tables=FORCE

Problem:
=========
ha_innobase::truncate() recreates the table by calling
ha_innobase::create(). create_table_info_t::check_table_options()
rejects ENCRYPTED=NO when innodb_encrypt_tables=FORCE, which is the
rule that forbids creating a new unencrypted table under FORCE.

TRUNCATE therefore failed in create(). For a temporary table,
truncate() frees m_prebuilt before calling create(). If the create()
fails and left m_prebuilt=nullptr and a subsequent
ha_innobase::reset()/update_thd() dereferenced it.

Solution:
========
check_table_options(): skip the innodb_encrypt_tables=FORCE check on
the internal recreate path (indicated by a non-null m_trx,
which is set only during truncate()).

ha_innobase::truncate(): validate the create options before dropping
the existing table. Any genuine validation failure
(missing encryption key, unsupported option combination etc)
now returns an error with the original table and m_prebuilt
left intact, instead of dropping the table and then
failing in create()
Alessandro Vetere
MDEV-40128 Use per-cell latch in lock_move_reorganize_page()

lock_move_reorganize_page() was acquiring lock_sys.latch in exclusive
mode (via LockMutexGuard) for the entire body of phase 2 (lock chain
iteration, bitmap reset, and lock_rec_add_to_queue() calls). The
function however only touches record locks belonging to a single page,
which all live in a single lock_sys.rec_hash cell. Holding that cell
latch in exclusive mode via LockGuard is sufficient:

- The cell latch protects the cell's lock chain and the bitmaps of the
  lock_t objects in it (lock_rec_bitmap_reset and the new bit set by
  lock_rec_add_to_queue()).
- It also protects lock->type_mode, including the LOCK_WAIT bit. The
  canonical clear in lock_reset_lock_and_trx_wait() runs under the cell
  latch, and lock_grant() invokes it before taking trx->mutex, so the bit
  is cell-latch state rather than trx->mutex state. Phase 1 only clears
  the bit and leaves trx->lock.wait_lock intact; the copy in old_locks
  keeps LOCK_WAIT and phase 2 re-adds the lock with it, so the wait
  relationship (guarded by lock_sys.wait_mutex) is preserved across the
  move. Neither trx->mutex nor wait_mutex is required here.
- Each owning trx's mutex is acquired per-iteration to protect that trx's
  trx_locks list and lock_heap during lock_rec_add_to_queue().

The global exclusive latch was over-strong: it blocked every concurrent
lock_sys.rd_lock() acquirer in lock_rec_lock() and lock_table()
server-wide for the duration of the reorganize, contributing
disproportionately to the lock_sys.latch convoy under heavy concurrency.

The TMLockGuard fast-path empty check at the top of the function is
preserved; for cells with no locks the cost is still just a TSX-elided
read.
Georg Richter
Added option -DWITH_TOOLS (default is OFF)

- Build of the tools directory is now optional
- Removed installation of the data directory
- Fixed README.md
  • cc-x-codbc-windows: 'dojob pwd if '3.4' == '3.4' ls win32/test SET TEST_DSN=master SET TEST_DRIVER=master SET TEST_PORT=3306 SET TEST_SCHEMA=odbcmaster if '3.4' == '3.4' cd win32/test if '3.4' == '3.4' ctest --output-on-failure' failed -  stdio
Marko Mäkelä
MDEV-14992 BACKUP SERVER

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ];
BACKUP SERVER WITH [ 1 CONCURRENT ] 'command';

In place of the 1, any positive number of threads may be specified.
For the first variant, '/path/to' must exist and '/path/to/directory'
must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or
command that will be executed in a child process. The standard input
of that command will be in a format that is compatible with
GNU tar --format=oldgnu (and also BSD tar variants that are also part of
Microsoft Windows and Apple macOS). The command is expected to optionally
compress and encrypt the stream and redirect it to a file on a local or
a remote server. The BACKUP SERVER WITH will append an additional argument,
a positive base-ten number in ASCII, starting with 1, to identify the
current thread. In this way, each concurrent stream can write a separate
file.

The backup or the first stream will contain a file backup.cnf, which
includes parameters needed for restoring the backup. Currently,
these are innodb_log_recovery_start and innodb_log_recovery_target.
If innodb_log_recovery_target>0, InnoDB will be in read-only mode,
not allowing any writes to persistent files other than via the log
application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty
directory needs to be created and all streams be extracted there using
the standard tar utility of the operating system, optionally after
undoing any encryption or compression that had been added by the
backup command. Then, the backup is prepared or MariaDB server started
up on the extracted directory, similar to as if the BACKUP SERVER TO
statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is
STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery
starts from the latest available log checkpoint. However, for restoring
a backup, recovery must start from the checkpoint that was the latest
when the backup was started. Starting recovery from a possible later
checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup,
storage engine interfaces, and basic copying of the storage engines
InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to
a stream. On Linux, this uses sendfile(2), which assumes that the
source data will not be changed before the data has been consumed
from the pipe.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

InnoDB_backup::queue: Collection of tablespace IDs and payload sizes
at the start of the backup. If any file is created or extended while
the backup is executing, we must have the corresponding write-ahead-log
entries that we are copying since the latest checkpoint that was
completed when the backup started. If any tablespaces are deleted
during the backup, we may or may not copy them, and the application
of a FILE_DELETE record will remove them. Similarly, FILE_RENAME
or FILE_CREATE records will take care of renaming or creating files
during recovery (applying the backed-up log).

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed
to be known. To speed up startup, InnoDB does not normally open
all tablespace files.
forkfun
MDEV-39380 Assertion `arg2_int >= 0' failed in Item_func_additive_op::result_precision

COALESCE/IFNULL/CASE over a hex/bit literal went through string attribute
aggregation, which reset the hybrid's integer attributes to string defaults
(decimals=NOT_FIXED_DEC, unsigned_flag=false). Functions relying on the
integer nature then broke: ROUND()/TRUNCATE() tripped the
`args[0]->decimals == 0' / `unsigned_flag' asserts, and SUM() built a
DECIMAL with scale > precision.

Restore decimals=0 and unsigned_flag in
Type_handler_hex_hybrid::Item_hybrid_func_fix_attributes when the
aggregated result is still a hex hybrid.
Thirunarayanan Balathandayuthapani
MDEV-39061 mariadb-backup compatible wrapper for BACKUP SERVER

This adds a shell script that lets users keep using their existing
mariadb-backup commands while the real work is done by the new
server-side BACKUP SERVER command. The goal is "drop-in": users should
not have to change their backup scripts.

extra/mariabackup/scripts/mariadb-backup-server.sh (plain POSIX sh)
understands the usual mariadb-backup modes and translates each one.
A companion helper, extra/mariabackup/scripts/mbstream-server.sh,
lets streamed backups be unpacked by pipelines that expect the
mbstream CLI. Both are documented in
extra/mariabackup/scripts/README.md.

--backup
========
Connects with the mariadb client and runs "BACKUP SERVER TO '<dir>'".
Connection options (--user, --host, --port, --socket, --defaults-file,
ssl, ...) are passed through to the client; --parallel=N becomes the
"<N> CONCURRENT" clause.
After the backup it writes backup-prepare.cnf into the backup
directory, recording what --prepare needs later: where
mariadbd lives, the InnoDB parameters (page size, data file path,
undo tablespaces, checksum algorithm, log file size), and if
the server is encrypted then how to reload the encryption key
plugin (the file_key_management variables),
so an encrypted backup can be prepared without extra input.

--backup --stream
=================
Runs "BACKUP SERVER WITH [N CONCURRENT] '<command>'": the server feeds
each stream's tar to <command>, the wrapper collects the parts, writes
them to stdout, then appends backup-prepare.cnf as a final tar. The
per-stream tars carry no end-of-archive marker; only the trailing
backup-prepare.cnf adds the single end marker, so the whole stream
extracts with a plain "tar -x".
Two properties follow from how BACKUP SERVER streams,
both differing from mariadb-backup:
- local: the stream command runs inside the server, so the wrapper
  must share its filesystem;
- tar only: any --stream=<format> (including xbstream) yields tar.
--target-dir is optional in stream mode (scratch for the per-stream
parts; a mktemp dir is used otherwise).

mbstream-server.sh maps the mbstream CLI onto a plain "tar -x"/"tar -c", so
existing "mbstream -x"/"-c" pipelines keep working on the wrapper's
stream. mbstream-only flags (-p/--parallel, ...) are accepted and
ignored; any other unknown option is rejected.

Environment overrides (mainly for testing): MARIADB (client),
MARIADBD (the --prepare bootstrap server) and TAR (the tar
implementation, e.g. TAR=bsdtar) can each be overridden. To run the
bootstrap under rr, put it in MARIADBD and let rr's own _RR_TRACE_DIR
choose the trace location, e.g.
  _RR_TRACE_DIR=/dev/shm/rr MARIADBD='rr record mariadbd'

--prepare
=========
Starts "mariadbd --bootstrap" on the backup directory using
backup-prepare.cnf as its defaults file, replays the archived redo
log between the start and target LSN read from backup.cnf,
then builds a fresh ib_logfile0 so a normal server can start
on the directory. mariadbd is taken from the path recorded in
backup-prepare.cnf if that binary exists, otherwise from PATH.
User --defaults-file/-extra-file and encryption options are
layered onto the bootstrap.

--copy-back / --move-back
=========================
Copy or move a prepared backup into the datadir. The datadir
is created if missing, a non-empty datadir is refused unless
--force-non-empty-directories is given, and a chown
reminder is printed.

If --aria-log-dir-path is given, the Aria logs (aria_log_control,
aria_log.*) are relocated into that directory.

Packaging
=========
The wrapper is not installed by default and never replaces the
real mariadb-backup / mbstream binaries.
1. cmake -DWITH_MARIABACKUP_WRAPPER=ON (default OFF) controls it.
2. When ON, the scripts install as /usr/bin/mariadb-backup-server
and /usr/bin/mbstream-server, tagged COMPONENT Backup so they
ship in the mariadb-backup package.
3. RPM: nothing extra to do. the component handles it.
4. DEB: not wired. debian/rules uses --fail-missing and does not
enable the option, so the -server binaries are not listed.
To ship via DEB, make a paired change: add
-DWITH_MARIABACKUP_WRAPPER=ON in debian/rules and list both
usr/bin/mariadb-backup-server and
usr/bin/mbstream-server in debian/mariadb-backup.install together.
5. The real mariadb-backup/mbstream binaries and the
mariabackup symlink are left untouched; opt in via an alias or a
symlink early in PATH.

Limitations (not supported yet)
===============================
1) Incremental backup & prepare (--incremental-basedir,
  --incremental-dir, --apply-log-only)
2) --rollback-xa
3) Partial backup (--databases, --tables, --tables-file)
4) Output compression and encryption (--compress, --encrypt)
5) --export is accepted but only warns and runs a plain recovery
6) --extra-lsndir is ignored
7) Windows: POSIX sh only, not installed on Windows

Behaviour differences from native mariadb-backup
================================================
- The wrapper needs the mariadb client on PATH for
--backup, and mariadbd on PATH (or recorded in backup-prepare.cnf)
for --prepare
- BACKUP SERVER refuses an already-existing target directory
- BACKUP SERVER does copy the data file as raw pages without
checksum validation, so a corrupted table is not detected
at backup time
- --prepare only works on a wrapper-made backup. It
needs backup-prepare.cnf)
- --stream is tar, not xbstream, and local-only

Tests
=====
include/have_mariabackup_wrapper.inc redirects $XTRABACKUP to
mariadb-backup-server.sh and $XBSTREAM to mbstream-server.sh, skipping
when a wrapper or the mariadb client is unavailable.
include/have_mariabackup_combination.inc runs a test under both the
[CLIENT] mariadb-backup binary and the [SERVER] wrapper.
ParadoxV5
Fix redefining `bool` when including `ma_global.h` in C++ (#311)

* Fix redefining `bool` when including `ma_global.h` in C++

If `HAVE_BOOL` is not defined (by default), the C++ check didn’t matter because of the `||` operator.
  • cc-x-codbc-windows: 'dojob pwd if '3.4' == '3.4' ls win32/test SET TEST_DSN=master SET TEST_DRIVER=master SET TEST_PORT=3306 SET TEST_SCHEMA=odbcmaster if '3.4' == '3.4' cd win32/test if '3.4' == '3.4' ctest --output-on-failure' failed -  stdio
Dave Gosselin
MDEV-38210: Unary negation of LONGTEXT, wrong result under GROUP BY

Unary negation of a LONGTEXT or LONGBLOB value returned the wrong
result under GROUP BY.  The length of the result was set to the
argument length plus one for the sign, but for these two types the
argument length is already the largest value the length field can
hold, so adding one wrapped it back to zero.  A zero length result
loses its value when it is stored in the temporary table that GROUP BY
builds, so the query returned an empty value instead of the expected
number.  The argument length is now limited before the sign character
is added, so it can no longer wrap to zero.
rusher
Add clang build job to CI
Vladislav Vaintroub
MDEV-37996 CMake: MariaDB:: targets for bundled-or-system libraries

Stop overwriting the standard find_package() result variables (ZLIB_FOUND,
ZLIB_LIBRARIES, ZLIB_INCLUDE_DIR(S), ...), this breaks vcpkg.

Provide namespaced INTERFACE targets that point at either the bundled or
the system library and carry their include directories (and, for SSL, the
compile definitions):

  MariaDB::zlib, MariaDB::OpenSSL, MariaDB::pcre2-8, MariaDB::pcre2-posix,
  MariaDB::fmt, MariaDB::readline

Link these consistently instead of the scattered ${*_LIBRARIES} and
${*_INCLUDE_DIR(S)} variables sprinkled across the tree.
Vladislav Vaintroub
MDEV-37996 update libmariadb to use server's MariaDB::zlib

Remove hack in cmake/mariadb_connector_c.cmake that sets variables
to make connectors FIND_PACKAGE(ZLIB) use server's zlib.
it is not necessary anymore, and is expressed more directly.
forkfun
MDEV-38061 Assertion `args[0]->decimal_precision() < 22' in Item_func_round::fix_arg_hex_hybrid

Item_name_const forwarded type_handler() to the wrapped value but not
decimal_precision(), so a hex literal wrapped in NAME_CONST reported the
generic precision instead of its own. Forward decimal_precision() too.
Dave Gosselin
MDEV-40143: st_isvalid() does not clear NULL state between rows

The st_isvalid() function evaluates nullability on every row.  Previously,
it would preserve potentially stale null state between rows, so the first
row yielding a null result for st_isvalid() would propagate to the results
for the remaining rows.

Implementation-wise, this patch changes the Item_func_isvalid::val_int
method implementation to work like Item_func_validate::val_str in that it
assumes that the method will indicate a null result for the row unless
it returns a non-null result, at which point it clears the null_value flag.
Alessandro Vetere
MDEV-40129 Retry transient trylock failures in lock-release fast paths

The trylock attempts on per-cell lock_sys_t::hash_latch (try_acquire())
and on per-table dict_table_t::lock_mutex_trylock() inside
lock_release_try(), lock_release_on_prepare_try() and
lock_rec_unlock_unmodified() now use a bounded spin loop
(up to LOCK_RELEASE_TRY_SPIN_BUDGET CAS attempts, with MY_RELAX_CPU()
between them) instead of a single CAS attempt.

These paths hold trx->mutex while attempting the trylock, which is the
reverse of the standard order used by lock_rec_convert_impl_to_expl().
Blocking acquisition is therefore unsafe, hence the trylock pattern.
However, a single failed CAS marks the entire pass of lock_release_try()
as unsuccessful, and after 5 such failed passes lock_release() falls
back to exclusive lock_sys.wr_lock() for the whole transaction. That
global wr_lock then blocks every concurrent lock_sys.rd_lock() acquirer
in lock_rec_lock() and lock_table(), producing a server-wide convoy
under heavy concurrency.

The bounded spin (no syscall, no blocking) gives a transient latch
holder time to release without weakening the deadlock-avoidance
guarantee that motivated the trylock pattern. The extra trx->mutex hold
time is bounded by LOCK_RELEASE_TRY_SPIN_BUDGET times the pause cost.

This is a first, still to be fine-tuned implementation. Only the
lock_release_try() path has been positively tested; the
lock_release_on_prepare_try() path is not yet covered.
Vladislav Vaintroub
Use the server's MariaDB::zlib target when built as a submodule

When Connector/C is built inside the MariaDB server tree, pick up the server's
zlib via the MariaDB::zlib target instead of running FIND_PACKAGE(ZLIB), so the
connector uses exactly the same zlib (bundled or system) as the server and no
longer depends on the FindZLIB result-variable signature. Standalone builds are
unchanged - they still fall back to FIND_PACKAGE(ZLIB).

Also stop adding zlib to CMAKE_REQUIRED_LIBRARIES: ZLIB_LIBRARY may now be an
(ALIAS) target which cannot be used inside try_compile() checks, and none of
those checks need zlib.

Server: MDEV-37996