Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Categories: connectors experimental galera main
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

connectors experimental galera main
Marko Mäkelä
fixup! 6c59b142f4acb4a5a7477358a25c2a56ec915386
Daniel Black
MDEV-40176: UBSAN: runtime error: applying non-zero offset in `my_charpos_mb` (2)

my_numchars_mb and my_charpos_mb both used their cs argument
so remove the notused attribute on them.
Raghunandan Bhat
MDEV-40176: UBSAN: runtime error: applying non-zero offset in `my_charpos_mb`

Problem:
  When `my_charpos_mb()` is called with pos = end = NULL and the string
  has fewer than `length` characters, the `end + 2 - start` return
  expression evaluates `end+2`, forming the pointer NULL+2. Offsetting
  a null pointer is undefined behavior.

Fix:
  Compute the integer difference before adding the offset. The result is
  identical but no invalid pointer is ever formed.
Marko Mäkelä
squash! 81b3ae71537ca4c67ea4d0f740778f1596fd29a8

InnoDB_backup::queue: Collection of tablespace IDs and payload sizes
at the start of the backup
Alessandro Vetere
MDEV-40129 Retry transient trylock failures in lock-release fast paths

The trylock attempts on per-cell lock_sys_t::hash_latch (try_acquire())
and on per-table dict_table_t::lock_mutex_trylock() inside
lock_release_try(), lock_release_on_prepare_try() and
lock_rec_unlock_unmodified() now use a bounded spin loop
(up to LOCK_RELEASE_TRY_SPIN_BUDGET CAS attempts, with MY_RELAX_CPU()
between them) instead of a single CAS attempt.

These paths hold trx->mutex while attempting the trylock, which is the
reverse of the standard order used by lock_rec_convert_impl_to_expl().
Blocking acquisition is therefore unsafe, hence the trylock pattern.
However, a single failed CAS marks the entire pass of lock_release_try()
as unsuccessful, and after 5 such failed passes lock_release() falls
back to exclusive lock_sys.wr_lock() for the whole transaction. That
global wr_lock then blocks every concurrent lock_sys.rd_lock() acquirer
in lock_rec_lock() and lock_table(), producing a server-wide convoy
under heavy concurrency.

The bounded spin (no syscall, no blocking) gives a transient latch
holder time to release without weakening the deadlock-avoidance
guarantee that motivated the trylock pattern. The extra trx->mutex hold
time is bounded by LOCK_RELEASE_TRY_SPIN_BUDGET times the pause cost.

This is a first, still to be fine-tuned implementation. Only the
lock_release_try() path has been positively tested; the
lock_release_on_prepare_try() path is not yet covered.
Daniel Black
MDEV-39881: mroonga set, but not used variables Mroonga unused but set variable(2)

Part 2:

In grn_table_group_multi_keys_vector_record, kp could just be removed.

In grn_ja_element_info, we only needed the size, and pos for the
non-huge case. Extract out the macros so we've got a unique macro
fetching the huge size and pos/size for the non-huge case.
Vladislav Vaintroub
MDEV-35743 Role-granted routine privileges lost after FLUSH PRIVILEGES

Roles are merged in dependency order: each role has a counter equal to
the number of roles granted to it, and it is merged only once that
counter is ticked down to zero (all roles it inherits from are done).
The counter is ticked when the walk follows an edge into the role.

When rebuilding everything from scratch (acl_load / FLUSH PRIVILEGES),
merge_role_privileges() did not follow the edges out of a role whose own
privileges did not change. For a role granted to several roles (more than
one incoming edge), its counter never reached zero, it was never merged,
and it lost the privileges it should have inherited indirectly.

The "did not change, so stop" shortcut is only valid for incremental
propagation, where the rest of the graph is already merged. During a full
rebuild we must always keep walking so every counter reaches zero.
Yuchen Pei
MDEV-40103 Initialise thd->net in spider_create_sys_thd

net is used by THD::print_aborted_warning when threads are killed so
this needed to be initalised.

This fixes an msan "uninitialized value" bug
PranavKTiwari
Added a new bit map
Alexander Barkov
MDEV-39587 Package-wide TYPE for variable declarations

SET sql_mode=ORACLE;
DELIMITER $$
CREATE OR REPLACE PACKAGE pkg AS
  -- Declare a package public data type
  TYPE varchar_array IS TABLE OF VARCHAR(2000) INDEX BY INTEGER;
END;
$$
DELIMITER ;
DELIMITER $$

CREATE OR REPLACE PROCEDURE p1 AS
  v pkg.varchar_array; -- Use the package public data type
BEGIN
  v(0):='test';
  SELECT v(0);
END;
$$
DELIMITER ;

Note, the change is done only for sql_mode=ORACLE, because the TYPE
declaration is not available for the default mode.

Where package-wide types are available
--------------------------------------
- Variabe list type:
    DECLARE var pkg1.type1;

- RETURN type for a package routine:
    CREATE FUNCTION .. RETURN pkg1.type1 ...

- Parameter type for a package routine:
    PROCEDURE p1(param1 pkg1.type1);

- Assoc array element type:
    TYPE assoc1_t IS TABLE OF pkg1.type1 ...

- REF CURSOR RETURN type:
    TYPE cur1_t IS REF CURSOR RETURN pkg1.type1;

Change details
--------------

- Adding a member Lex_length_and_dec_st::m_foreign_module_type
  It's set to true when the data type was initialized from a TYPE
  in foreign routine (e.g. in PACKAGE spec).
  It's needed to prevent use of qualified identifiers in public contexts,
  i.e. in schema public routine parameter types and schema publuc function
  RETURN types.
  Adding a helper method sp_head::check_applicability() which prevents
  use of qualified types in public context.

- Adding a helper method sp_head::raise_unknown_data_type().

- Adding methods LEX::set_field_type_typedef_package_spec() for
  2-step and 3-step qualified indentifiers.
  It's used in field_type_all_with_typedefs which covers cases:
  - Variabe list type        : DECLARE var pkg1.type1;
  - RETURN type              : CREATE FUNCTION .. RETURN pkg1.type1 ...
  - Parameter type          : PROCEDURE p1(param1 pkg1.type1);
  - Assoc array element type : TYPE assoc1_t IS TABLE OF pkg1.type1 ...

- Adding a method LEX::declare_type_ref_cursor_return_typedef().
  It handles cases when a new TYPE REF CURSOR RETURN is declared,
  for both for qualified RETURN types and non-qualified RETURN types:
  - TYPE cur0_t IS REF CURSOR RETURN rec1_t;
  - TYPE cur0_t IS REF CURSOR RETURN pkg1.rec1_t;
  - TYPE cur0_t IS REF CURSOR RETURN db1.pkg1.rec1_t;

  The code was moved from LEX::declare_type_ref_cursor() into
  LEX::declare_type_ref_cursor_return_typedef() and extended
  to cover qualified RETURN types.

- Adding a method Sql_path::find_package_spec_type().
  It iterates through all schemas specified in @@path and searches
  for the given type in the given package.

- Adding a helper method sp_pcontext::type_defs_add_ref_cursor()
  to reuse the code.

- Adding a new method sp_package::get_typedef() to search
  for TYPE definitions in PACKAGE specifications.

- Adding a new method sp_head::get_typedef_package_spec()
  to search for TYPE definitions used by a PROCEDURE or FUNCTION.

- Adding a helper method
    Sp_handler::sp_cache_routine_reentrant_suppress_errors
  Adding a method Sp_handler::find_package_spec().
Daniel Black
Merge branch '10.11' into MDEV-37187
Raghunandan Bhat
MDEV-39690: UBSAN: signed integer overflow in `my_strntoll_8bit`, `my_strntoll_mb2_or_mb4` during BLOB-to-integer conversion

Problem:
  When converting a string like '-9223372036854775808' to an integer,
  the parsed magnitude (2^63) equals `(ulonglong) LONGLONG_MIN` and is
  accepted as valid. The return expression then cast it to signed
  (LONGLONG_MIN) and negated it. Negating LONGLONG_MIN is signed integer
  overflow, i.e. undefined behaviour.

Fix:
  Handle the boundary value explicitly. When the parsed magnitude equals
  `(ulonglong) LONGLONG_MIN`, return LONGLONG_MIN directly. Any smaller
  magnitude fits in a positive longlong, so the existing cast and
  negation stay well defined.

Backport from 11.4 commit - f552febe4315875143d952e41b2b5d17ca29b39c
Alessandro Vetere
MDEV-40218 lock_rec_unlock_unmodified<CELL>() can release a stale per-cell latch

On a secondary index, lock_rec_unlock_unmodified<CELL>() drops the cell
latch and lock_sys.latch before calling lock_sec_rec_some_has_impl(), then
re-acquires lock_sys.latch in shared mode, recomputes the cell address and
latches the newly computed cell.  A concurrent lock_sys_t::hash_table::resize()
(rec_hash grows with the buffer pool) during that window reallocates the cell
array, so the cell and its latch move.  On success the function returned true
while holding the latch of the new cell, but lock_release_on_prepare_try() then
released the latch variable it had computed from the old cell address, which
could be stale.

Fix: make the cell parameter of lock_rec_unlock_unmodified() an in/out
reference so the function reports the cell it currently holds, and have the
CELL caller release that cell's latch.  This reuses the cell address the
function already recomputed, avoiding a second rec_hash lookup.
Yuchen Pei
MDEV-39558 Check resulting VECTOR length in type aggregation inference

VECTOR, as a subtype of VARCHAR, has max length 65532, i.e. maximum
dimension of 16383. BLOB/MEDIUMBLOB/LONGBLOB (and corresponding TEXT
types) each has length exceeding 65532. Therefore, when aggregating
VECTOR with one of these BLOB/TEXT types, the aggregated type has
length exceeding the max length of VECTOR, which should result in an
error.

To that end, in this patch we add checks on the resulting vector
length during type aggregation inference
Alexander Barkov
MDEV-39587 Package-wide TYPE for variable declarations

SET sql_mode=ORACLE;
DELIMITER $$
CREATE OR REPLACE PACKAGE pkg AS
  -- Declare a package public data type
  TYPE varchar_array IS TABLE OF VARCHAR(2000) INDEX BY INTEGER;
END;
$$
DELIMITER ;
DELIMITER $$

CREATE OR REPLACE PROCEDURE p1 AS
  v pkg.varchar_array; -- Use the package public data type
BEGIN
  v(0):='test';
  SELECT v(0);
END;
$$
DELIMITER ;

Note, the change is done only for sql_mode=ORACLE, because the TYPE
declaration is not available for the default mode.

Where package-wide types are available
--------------------------------------
- Variabe list type:
    DECLARE var pkg1.type1;

- RETURN type for a package routine:
    CREATE FUNCTION .. RETURN pkg1.type1 ...

- Parameter type for a package routine:
    PROCEDURE p1(param1 pkg1.type1);

- Assoc array element type:
    TYPE assoc1_t IS TABLE OF pkg1.type1 ...

- REF CURSOR RETURN type:
    TYPE cur1_t IS REF CURSOR RETURN pkg1.type1;

Change details
--------------

- Adding a member Lex_length_and_dec_st::m_foreign_module_type
  It's set to true when the data type was initialized from a TYPE
  in foreign routine (e.g. in PACKAGE spec).
  It's needed to prevent use of qualified identifiers in public contexts,
  i.e. in schema public routine parameter types and schema publuc function
  RETURN types.
  Adding a helper method sp_head::check_applicability() which prevents
  use of qualified types in public context.

- Adding a helper method sp_head::raise_unknown_data_type().

- Adding methods LEX::set_field_type_typedef_package_spec() for
  2-step and 3-step qualified indentifiers.
  It's used in field_type_all_with_typedefs which covers cases:
  - Variabe list type        : DECLARE var pkg1.type1;
  - RETURN type              : CREATE FUNCTION .. RETURN pkg1.type1 ...
  - Parameter type          : PROCEDURE p1(param1 pkg1.type1);
  - Assoc array element type : TYPE assoc1_t IS TABLE OF pkg1.type1 ...

- Adding a method LEX::declare_type_ref_cursor_return_typedef().
  It handles cases when a new TYPE REF CURSOR RETURN is declared,
  for both for qualified RETURN types and non-qualified RETURN types:
  - TYPE cur0_t IS REF CURSOR RETURN rec1_t;
  - TYPE cur0_t IS REF CURSOR RETURN pkg1.rec1_t;
  - TYPE cur0_t IS REF CURSOR RETURN db1.pkg1.rec1_t;

  The code was moved from LEX::declare_type_ref_cursor() into
  LEX::declare_type_ref_cursor_return_typedef() and extended
  to cover qualified RETURN types.

- Adding a method Sql_path::find_package_spec_type().
  It iterates through all schemas specified in @@path and searches
  for the given type in the given package.

- Adding a helper method sp_pcontext::type_defs_add_ref_cursor()
  to reuse the code.

- Adding a new method sp_package::get_typedef() to search
  for TYPE definitions in PACKAGE specifications.

- Adding a new method sp_head::get_typedef_package_spec()
  to search for TYPE definitions used by a PROCEDURE or FUNCTION.

- Adding a helper method
    Sp_handler::sp_cache_routine_reentrant_suppress_errors
  Adding a method Sp_handler::find_package_spec().
Vladislav Vaintroub
CONC-830 Add MARIADB_CLIENT_PLUGIN_EXPORT for zstd compression plugin

This will allow shared library to export its symbol, as since MDEV-37527
all symbols are hidden by default.
Marko Mäkelä
MDEV-14992 BACKUP SERVER

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ];
BACKUP SERVER WITH [ 1 CONCURRENT ] 'command';

In place of the 1, any positive number of threads may be specified.
For the first variant, '/path/to' must exist and '/path/to/directory'
must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or
command that will be executed in a child process. The standard input
of that command will be in a format that is compatible with
GNU tar --format=oldgnu (and also BSD tar variants that are also part of
Microsoft Windows and Apple macOS). The command is expected to optionally
compress and encrypt the stream and redirect it to a file on a local or
a remote server. The BACKUP SERVER WITH will append an additional argument,
a positive base-ten number in ASCII, starting with 1, to identify the
current thread. In this way, each concurrent stream can write a separate
file.

The backup or the first stream will contain a file backup.cnf, which
includes parameters needed for restoring the backup. Currently,
these are innodb_log_recovery_start and innodb_log_recovery_target.
If innodb_log_recovery_target>0, InnoDB will be in read-only mode,
not allowing any writes to persistent files other than via the log
application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty
directory needs to be created and all streams be extracted there using
the standard tar utility of the operating system, optionally after
undoing any encryption or compression that had been added by the
backup command. Then, the backup is prepared or MariaDB server started
up on the extracted directory, similar to as if the BACKUP SERVER TO
statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is
STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery
starts from the latest available log checkpoint. However, for restoring
a backup, recovery must start from the checkpoint that was the latest
when the backup was started. Starting recovery from a possible later
checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup,
storage engine interfaces, and basic copying of the storage engines
InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to
a stream. On Linux, this uses sendfile(2), which assumes that the
source data will not be changed before the data has been consumed
from the pipe.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

InnoDB_backup::queue: Collection of tablespace IDs and payload sizes
at the start of the backup. If any file is created or extended while
the backup is executing, we must have the corresponding write-ahead-log
entries that we are copying since the latest checkpoint that was
completed when the backup started. If any tablespaces are deleted
during the backup, we may or may not copy them, and the application
of a FILE_DELETE record will remove them. Similarly, FILE_RENAME
or FILE_CREATE records will take care of renaming or creating files
during recovery (applying the backed-up log).

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed
to be known. To speed up startup, InnoDB does not normally open
all tablespace files.
Vladislav Vaintroub
Merge branch '10.11' into MDEV-37187
Alessandro Vetere
MDEV-40210 Redundant CAS in async_flush_lsn::try_clear_if_at_most()

MDEV-39600 added try_clear_if_at_most() to clear buf_flush_async_lsn with
an atomic CAS that preserves a concurrent bump(). If the snapshot is
already 0, compare_exchange_strong(0, 0) is a no-op, so return early on a
zero snapshot and avoid the atomic read-modify-write. The page cleaner
calls this on every pass, so in the common steady state (no async flush
queued) it drops needless exclusive access to the m_lsn cache line. A zero
value is already the cleared state and a concurrent bump() is preserved
either way, so the result is identical.
Alessandro Vetere
MDEV-40209 Escalate lock-release via a saturating stall counter

lock_release() and lock_release_on_prepare() release a committing or preparing
transaction's explicit locks under the shared lock_sys.rd_lock(), taking each
per-cell hash latch and per-table lock mutex with a trylock because trx->mutex
is held in the reverse of the normal latch order. A single failed trylock
marked the whole pass unsuccessful, and after a fixed cap of 5 such passes the
code escalated to the exclusive lock_sys.wr_lock() for the whole transaction.
Under concurrency the trylocks fail transiently, so the cap escalated
transactions that were still steadily releasing locks, not just stuck ones; the
exclusive latch then blocks every concurrent lock_sys.rd_lock() acquirer in
lock_rec_lock() and lock_table(), producing a convoy. The chance of hitting the
cap rises with both the contention level and the number of latches a
transaction must trylock per pass.

Replace the fixed cap with a saturating stall counter (LOCK_RELEASE_MAX_STALLS,
incremented on a no-progress pass, decremented on progress, floored at zero)
that escalates a genuinely stuck transaction after 5 net stalls, as the fixed
cap did, while leaving a transaction that keeps making progress to finish under
the shared latch. A hard LOCK_RELEASE_MAX_PASSES ceiling bounds the loop
independently, for the case where concurrent activity keeps adding locks (e.g.
implicit-to-explicit conversion during XA PREPARE) so that progress never
converges. The _try functions report progress through an out-parameter computed
under trx->mutex, so trx->lock.trx_locks is never read unlatched.
Alexander Barkov
Cherry-pick MDEV-40155 Weak REF CURSOR without RETURN is not opened using a dynamic SQL statement

This statement:
  OPEN c FOR 'dynamic sql'
was only allowed for SYS_REFCURSOR.

Additionally allow it for REF CURSOR with no RETURN clause, e.g.:
  TYPE cur0_t IS REF CURSOR; -- No RETURN clause - OK for OPEN FOR

Note, REF CURSORs with RETURN clause are still not allowed for dynamic OPEN,
as expected.
Raghunandan Bhat
MDEV-39690: UBSAN: signed integer overflow in `my_strntoll_8bit`, `my_strntoll_mb2_or_mb4` during BLOB-to-integer conversion

Problem:
  When converting a string like '-9223372036854775808' to an integer,
  the parsed magnitude (2^63) equals `(ulonglong) LONGLONG_MIN` and is
  accepted as valid. The return expression then cast it to signed
  (LONGLONG_MIN) and negated it. Negating LONGLONG_MIN is signed integer
  overflow, i.e. undefined behaviour.

Fix:
  Handle the boundary value explicitly. When the parsed magnitude equals
  `(ulonglong) LONGLONG_MIN`, return LONGLONG_MIN directly. Any smaller
  magnitude fits in a positive longlong, so the existing cast and
  negation stay well defined.

Backport from 11.4 commit - f552febe4315875143d952e41b2b5d17ca29b39c
Raghunandan Bhat
MDEV-39690: UBSAN: signed integer overflow in `my_strntoll_8bit`, `my_strntoll_mb2_or_mb4` during BLOB-to-integer conversion

Problem:
  When converting a string like '-9223372036854775808' to an integer,
  the parsed magnitude (2^63) equals `(ulonglong) LONGLONG_MIN` and is
  accepted as valid. The return expression then cast it to signed
  (LONGLONG_MIN) and negated it. Negating LONGLONG_MIN is signed integer
  overflow, i.e. undefined behaviour.

Fix:
  Handle the boundary value explicitly. When the parsed magnitude equals
  `(ulonglong) LONGLONG_MIN`, return LONGLONG_MIN directly. Any smaller
  magnitude fits in a positive longlong, so the existing cast and
  negation stay well defined.
Thirunarayanan Balathandayuthapani
MDEV-40085  TRUNCATE of temporary table with ENCRYPTED=NO crashes
under innodb_encrypt_tables=FORCE

Problem:
=========
ha_innobase::truncate() recreates the table by calling
ha_innobase::create(). create_table_info_t::check_table_options()
rejects ENCRYPTED=NO when innodb_encrypt_tables=FORCE, which is the
rule that forbids creating a new unencrypted table under FORCE.

TRUNCATE therefore failed in create(). For a temporary table,
truncate() frees m_prebuilt before calling create(). If the create()
fails and left m_prebuilt=nullptr and a subsequent
ha_innobase::reset()/update_thd() dereferenced it.

Solution:
========
check_table_options(): skip the innodb_encrypt_tables=FORCE check on
the internal recreate path (indicated by a non-null m_trx,
which is set only during truncate()).

ha_innobase::truncate(): validate the create options before dropping
the existing table. Any genuine validation failure
(missing encryption key, unsupported option combination etc)
now returns an error with the original table and m_prebuilt
left intact, instead of dropping the table and then
failing in create()
Yuchen Pei
MDEV-39558 Check resulting VECTOR length in type aggregation inference

VECTOR, as a subtype of VARCHAR, has max length 65532, i.e. maximum
dimension of 16383. BLOB/MEDIUMBLOB/LONGBLOB (and corresponding TEXT
types) each has length exceeding 65532. Therefore, when aggregating
VECTOR with one of these BLOB/TEXT types, the aggregated type has
length exceeding the max length of VECTOR, which should result in an
error.

To that end, in this patch we add checks on the resulting vector
length during type aggregation inference
forkfun
MDEV-23444 ASAN dynamic-stack-buffer-overflow or Assertion `precision > 0'
failed in decimal_bin_size with div_precision_increment=0

A signed length-1 numeric (e.g. WEEKDAY(), DAYOFWEEK()) had
decimal_precision()==0, since my_decimal_length_to_precision(1,0,signed)=0.
The zero propagated into DECIMALs derived from such values (division,
AVG, UNION merge) and tripped `precision > 0' when the value was turned
into a field, a GROUP BY key or a filesort key.

Floor my_decimal_length_to_precision() at 1 when length>0: a non-empty
numeric always has at least one digit.
Marko Mäkelä
MDEV-14992 BACKUP SERVER

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ];
BACKUP SERVER WITH [ 1 CONCURRENT ] 'command';

In place of the 1, any positive number of threads may be specified.
For the first variant, '/path/to' must exist and '/path/to/directory'
must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or
command that will be executed in a child process. The standard input
of that command will be in a format that is compatible with
GNU tar --format=oldgnu (and also BSD tar variants that are also part of
Microsoft Windows and Apple macOS). The command is expected to optionally
compress and encrypt the stream and redirect it to a file on a local or
a remote server. The BACKUP SERVER WITH will append an additional argument,
a positive base-ten number in ASCII, starting with 1, to identify the
current thread. In this way, each concurrent stream can write a separate
file.

The backup or the first stream will contain a file backup.cnf, which
includes parameters needed for restoring the backup. Currently,
these are innodb_log_recovery_start and innodb_log_recovery_target.
If innodb_log_recovery_target>0, InnoDB will be in read-only mode,
not allowing any writes to persistent files other than via the log
application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty
directory needs to be created and all streams be extracted there using
the standard tar utility of the operating system, optionally after
undoing any encryption or compression that had been added by the
backup command. Then, the backup is prepared or MariaDB server started
up on the extracted directory, similar to as if the BACKUP SERVER TO
statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is
STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery
starts from the latest available log checkpoint. However, for restoring
a backup, recovery must start from the checkpoint that was the latest
when the backup was started. Starting recovery from a possible later
checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup,
storage engine interfaces, and basic copying of the storage engines
InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to
a stream. On Linux, this uses sendfile(2), which assumes that the
source data will not be changed before the data has been consumed
from the pipe.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

InnoDB_backup::queue: Collection of tablespace IDs and payload sizes
at the start of the backup. If any file is created or extended while
the backup is executing, we must have the corresponding write-ahead-log
entries that we are copying since the latest checkpoint that was
completed when the backup started. If any tablespaces are deleted
during the backup, we may or may not copy them, and the application
of a FILE_DELETE record will remove them. Similarly, FILE_RENAME
or FILE_CREATE records will take care of renaming or creating files
during recovery (applying the backed-up log).

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed
to be known. To speed up startup, InnoDB does not normally open
all tablespace files.
ParadoxV5
Fix redefining `bool` when including `ma_global.h` in C++ (#311)

* Fix redefining `bool` when including `ma_global.h` in C++

If `HAVE_BOOL` is not defined (by default), the C++ check didn’t matter because of the `||` operator.
Yuchen Pei
MDEV-39558 Check resulting VECTOR length in type aggregation inference

VECTOR, as a subtype of VARCHAR, has max length 65532, i.e. maximum
dimension of 16383. BLOB/MEDIUMBLOB/LONGBLOB (and corresponding TEXT
types) each has length exceeding 65532. Therefore, when aggregating
VECTOR with one of these BLOB/TEXT types, the aggregated type has
length exceeding the max length of VECTOR, which should result in an
error.

To that end, in this patch we add checks on the resulting vector
length during type aggregation inference
Alessandro Vetere
MDEV-40128 Use per-cell latch in lock_move_reorganize_page()

lock_move_reorganize_page() was acquiring lock_sys.latch in exclusive
mode (via LockMutexGuard) for the entire body of phase 2 (lock chain
iteration, bitmap reset, and lock_rec_add_to_queue() calls). The
function however only touches record locks belonging to a single page,
which all live in a single lock_sys.rec_hash cell. Holding that cell
latch in exclusive mode via LockGuard is sufficient:

- The cell latch protects the cell's lock chain and the bitmaps of the
  lock_t objects in it (lock_rec_bitmap_reset and the new bit set by
  lock_rec_add_to_queue()).
- It also protects lock->type_mode, including the LOCK_WAIT bit. The
  canonical clear in lock_reset_lock_and_trx_wait() runs under the cell
  latch, and lock_grant() invokes it before taking trx->mutex, so the bit
  is cell-latch state rather than trx->mutex state. Phase 1 only clears
  the bit and leaves trx->lock.wait_lock intact; the copy in old_locks
  keeps LOCK_WAIT and phase 2 re-adds the lock with it, so the wait
  relationship (guarded by lock_sys.wait_mutex) is preserved across the
  move. Neither trx->mutex nor wait_mutex is required here.
- Each owning trx's mutex is acquired per-iteration to protect that trx's
  trx_locks list and lock_heap during lock_rec_add_to_queue().

The global exclusive latch was over-strong: it blocked every concurrent
lock_sys.rd_lock() acquirer in lock_rec_lock() and lock_table()
server-wide for the duration of the reorganize, contributing
disproportionately to the lock_sys.latch convoy under heavy concurrency.

The TMLockGuard fast-path empty check at the top of the function is
preserved; for cells with no locks the cost is still just a TSX-elided
read.
Raghunandan Bhat
MDEV-39690: UBSAN: signed integer overflow in `my_strntoll_8bit`, `my_strntoll_mb2_or_mb4` during BLOB-to-integer conversion

Problem:
  When converting a string like '-9223372036854775808' to an integer,
  the parsed magnitude (2^63) equals `(ulonglong) LONGLONG_MIN` and is
  accepted as valid. The return expression then cast it to signed
  (LONGLONG_MIN) and negated it. Negating LONGLONG_MIN is signed integer
  overflow, i.e. undefined behaviour.

Fix:
  Handle the boundary value explicitly. When the parsed magnitude equals
  `(ulonglong) LONGLONG_MIN`, return LONGLONG_MIN directly. Any smaller
  magnitude fits in a positive longlong, so the existing cast and
  negation stay well defined.
bsrikanth-mariadb
MDEV-39360: set statement optimizer_record_context for query fails

Move the initialization of context recorder, and replay after
run_set_statement_if_requested() is invoked in the
mysql_execute_command() in sql_parse.cc
Marko Mäkelä
MDEV-39772 SET GLOBAL innodb_log_archive=OFF breaks recovery

log_t::set_recovered(): Set circular_recovery_from_sequence_bit_0
if the log was recovered in the innodb_log_archive=ON format,
so that a subsequent SET GLOBAL innodb_log_archive=OFF will
properly wait for a checkpoint before changing the format.
Daniel Black
MDEV-39513 connect table_type=INI memory leak

The MRUProfile structure, of which CurProfile=MRUProfile[0]
can contain a filename that is allocated. It is possible
for the CurProfile to be null, while others are allocated,
including those with a MRUProfile.filename already malloc.

As such the full PROFILE_ReleaseFile needs to be called on
all non-null MRUProfile entries to prevent a memory leak.

Corrects MDEV-9997
Raghunandan Bhat
MDEV-39690: UBSAN: signed integer overflow in `my_strntoll_8bit`, `my_strntoll_mb2_or_mb4` during BLOB-to-integer conversion

Problem:
  When converting a string like '-9223372036854775808' to an integer,
  the parsed magnitude (2^63) equals `(ulonglong) LONGLONG_MIN` and is
  accepted as valid. The return expression then cast it to signed
  (LONGLONG_MIN) and negated it. Negating LONGLONG_MIN is signed integer
  overflow, i.e. undefined behaviour.

Fix:
  Handle the boundary value explicitly. When the parsed magnitude equals
  `(ulonglong) LONGLONG_MIN`, return LONGLONG_MIN directly. Any smaller
  magnitude fits in a positive longlong, so the existing cast and
  negation stay well defined.

Backport from 11.4 commit - f552febe4315875143d952e41b2b5d17ca29b39c
Vladislav Vaintroub
MDEV-37187 hashicorp plugin: avoid expensive clock() in hot path

The plugin used clock() only to timestamp cache entries and measure
elapsed time against millisecond timeouts. As shown in MDEV-12345,
clock() is prohibitively expensive, and we do not need per-thread or
per-process CPU time here, only time differences.

Replace clock()/clock_t with the monotonic std::chrono::steady_clock.
Timeouts are kept as std::chrono::milliseconds