Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Categories: connectors experimental galera main
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

connectors experimental galera main
Sergei Golubchik
MDEV-32570 update tests
Kristian Nielsen
Binlog-in-engine: Disable test binlog_in_engine.rpl_gtid_index for Valgrind

The test requires large amounts of CPU and runs too slowly in Valgrind to
make sense, can even occasionally time out on loaded machines.

Signed-off-by: Kristian Nielsen <[email protected]>
Rucha Deodhar
MDEV-22675: Assertion `offset < table->s->reclength' failed in dump_leaf_key

Analysis:
The reclength depends on the size of each field of the temporary table plus
the null_bytes. Since null_bytes is 0 (because we’re in non-strict mode and
the source table fields can never be null), there is no need to keep track
of null values. Hence the total record length is just the sum of the sizes
of both fields. The second field will always have size 0 because its value
will always be an empty string. Hence it starts exactly where the first
field ends which equal to reclength. Hence the assertion failure.

Fix:
If pack_length() is 0 (which means the size of field is 0) so assing the
result as empty string directly, the assertion is not valid in that case.
Alexey Yurchenko
MDEV-38383 Fix MDEV-38073 MTR test warning

MDEV-38073 MTR test started to fail with a warning after upstream merge
from 11.4 a7528a6190807281d3224e4e67a9b76083a202a6 because THD responsible
for creating SST user became read-only when the server was started with
--transaction-read-only=TRUE.
make sure the readonly flag on THDs created for wsp::thd utility class is
cleared regardless of the --transaction-read-only value as it is intended
only for client-facing THDs.
Vlad Lesin
MDEV-31956 SSD based InnoDB buffer pool extension

Some code review fixes. Needs RQG testing as some new assertions were
added and some conditions were changed.
Brandon Nesterenko
MDEV-38117: Replication stops with ERROR when Primary Key is not defined in Multi Master

When replicating tables with different structures between a master and
slave (via binlog_row_metadata=FULL, i.e. MDEV-36290), if the table
didn't have any keys, replication could break from a
  Can't find record in '<table>', Error_code: 1032;
error. This was due to the table's internal tracking of provided values
(i.e. TABLE::has_value_set) persisting across transactions. I.e., when
applying a new row event, the has_value_set bitset would start in a
state indicative of the last event. Then, if the new row event required
first finding a row, the has_value_set could represent a larger bitset
than what was actually unpacked from the master. More specifically, if
the last row event inserted something into the table, where slave-only
columns would be given default/auto-generated values, these would
update the TABLE::has_value_set. However, when the next row event would
come in, those auto-generated values from the last transaction would
not be present, but the row-search would include them to look-up the
row, and not be able to find anything.

This is fixed by resetting the table's internal state of provided
values (TABLE::has_value_set) before unpacking the row data. Doing so
revealed a bug where unpacked fields which explicitly provided NULL
values would skip indicating that an explicit value was provided
(and thereby couldn't be found by find_row()). To fix this, the
slave-side field will always call has_explicit_value() if it is
present in the packed data.

Test case based on work from: Deepthi Sreenivas <[email protected]>

Signed-off-by: Brandon Nesterenko <[email protected]>
Reviewed-by: Monty <[email protected]>
Brandon Nesterenko
MDEV-38435: Update test results
Brandon Nesterenko
MDEV-38641: Failure of Replication of System Versioning Tables

System versioned table UPDATES would fail to replicate on debug builds
with the debug assertion:

rpl_utility_server.cc:1058: bool RPL_TABLE_LIST::give_compatibility_error(rpl_group_info *, uint): Assertion `m_tabledef.master_column_name[col]' failed.

Though caught with system versioned tables, the problem is
generalizable to any transactions which have multiple Rows_log_events
that update the same table. That is, during the error reporting for
columns which were present in the Rows_log_event but not on the slave
table, there is a debug assertion that validates that the master's
table name exists.  After reporting this error, the pointer to that
master table name is nullified. This means that future Rows_log_events
would not have this table name, and the assertion would fail (release
builds would likely segfault when logging the error).

The fix for this is not to nullify the pointer after reporting the
error, so future Rows_log_events can continue using the pointer to the
master's table name.

Signed-off-by: Brandon Nesterenko <[email protected]>
Brandon Nesterenko
MDEV-32570 Prep: Refactor functions to handle >32-bit lengths

The functions to read row log events from buffers used 32-bit numeric
types to hold the length of the buffer. MDEV-32570 will add in support
for row events larger than what 32-bits can represent, and so this
patch changes the type for the length variable to size_t so larger
row events can be created from raw memory.

Signed-off-by: Brandon Nesterenko <[email protected]>
Monty
MDEV-36290: Improved support of replication between tables of different structure

One can have data loss in multi-master setups when 1) both masters
update the same table, 2) ALTER TABLE is run on one master which
re-arranges the column ordering, and 3) transactions are binlogged
in ROW binlog_format.

This is because the slave assumes that all columns are in the same
order on the master and slave and all columns on the master also
exists on the slave. This happens even if binlog_row_metadata=FULL is
used.  If this is not the case, this will lead to silent data loss.

A new option for slave_type_conversions bit field,
ERROR_IF_MISSING_FIELD, has been added, along with a new error,
ER_SLAVE_INCOMPATIBLE_TABLE_DEF. This allows the user to define if
the slave should abort replication if it is missing some field that
existed on the master. The option is off by default to keep things
compatible with earlier versions.
If a field is missing on the slave and log_warnings >= 1, a warning
will be logged to the error log.

This patch fixes this, when binlog_row_metadata=FULL is used on the
master, by mapping fields with identical names on the master and slave.
If slave has fields that does not exist in the row event, these will
be set to their default value.

The main idea is that we added two conversion tables:
m_tabledef.master_to_slave_map[master_column_index] -> slave_column_index
and m_tabledef.master_to_slave_error[master_column_index] which contains
an error number if the master_column does not exist on the slave or
it is not possible to convert the master data to the slave column.
master_to_slave_error[#] contains 0 if the column exists and is compatible.

General code changes:
- Instead of looping over row fields in the order of slave table
  we are instead looping over fields in the order of the binary log.
- We are using table->write_set to know which fields should be updated
  on the slave. This is reflected in unpack_row
- We are calling TABLE::mark_columns_per_binlog_row_image() to ensure
  that rpl_write_set is properly set. This is needed if the slave also
  is doing binary logging.
- Before replication aborted if the master and slave tables were too
  different.  Now replication is only aborted if the row actually uses
  columns that does not exists on the slave (and ERROR_IF_MISSING_FIELD
  is used) or uses columns that cannot be converted.
  - Instead of giving errors in compatible_with(), used when table is
    accessed by first the row event, we are instead giving errors
    when we examine a row event and notice that it is accessing
    a not existing or not compatible field.

Other code changes:
- Removed conv_table argument from compatible_with() and store it
  directly in RPL_TABLE_LIST->m_conv_table
- table_def::compatible_with() returns now 1 on error (not 0).
- Remove m_width and skip arguments from prepare_record() as we are
  now using table->write_set() to check which elements need a default
  value.
- Moved DBUG_ENTER() to it's proper place (after variable
  declarations) in a few functions.
- Some changes in unpack_row():
  - Replaced null_mask and null_ptr with an indexed bit check for
    simplicity.
  - Removed check of rgi == null and table_found which never worked.
  - Updated comments to reflect current code.
  - Indentation changes as the code now uses 'continue' instead of
    'if-else' in the main loop.
  - The code to throw away 'extra master fields' is not needed as we
    are now looping over fields in binary log, not over fields in
    slave table.
- Simplified get_table_data(TABLE *table_arg) by returning found
  table_list.
- Errors for row events are now initialized in compatible_with(),
  checked in check_wrong_column_usage() and reported in
  give_compatibility_error().

Co-authored-by: Brandon Nesterenko <[email protected]>
Brandon Nesterenko
MDEV-32570 (test): Add tests

This commit adds the following MTR tests for MDEV-32570:
* rpl_fragment_row_event_main.test
* rpl_fragment_row_event_mysqlbinlog.test
* rpl_fragment_row_event_span_relay_logs.test
* rpl_fragment_row_event_err.test

And also fixes existing tests appropriately.

Signed-off-by: Brandon Nesterenko <[email protected]>
Michael Widenius
MDEV-19683 Add support for Oracle TO_DATE()

Syntax:
TO_DATE(string_expression [DEFAULT string_expression ON CONVERSION ERROR],
        format_string [,NLS_FORMAT_STRING])
The format_string has the same format elements as TO_CHAR(), except a
few elements that are not supported/usable for TO_DATE().
TO_DATE() returns a datetime or date value, depending on if the format
element FF is used.

Allowed separators, same as TO_CHAR():
space, tab and any of !#%'()*+,-./:;<=>

'&' can also be used if next character is not a character a-z or A-Z
"text' indicates a text string that is verbatim in the format. One cannot
use " as a separator.

Format elements supported by TO_DATE():
AD          Anno Domini ("in the year of the Lord")
AD_DOT      Anno Domini ("in the year of the Lord")
AM          Meridian indicator (Before midday)
AM_DOT      Meridian indicator (Before midday)
DAY        Name of day
DD          Day (1-31)
DDD        Day of year (1-336)
DY          Abbreviated name of day
FF[1-6]    Fractional seconds
HH          Hour (1-12)
HH12        Hour (1-12)
HH24        Hour (0-23)
MI          Minutes (0-59)
MM          Month (1-12)
MON        Abbreviated name of month
MONTH      Name of Month
PM          Meridian indicator (After midday)
PM_DOT      Meridian indicator (After midday)
RR          20th century dates in the 21st century. 2 digits
            50-99 is assumed from 2000, 0-49 is assumed from 1900.
RRRR        20th century dates in the 21st century. 4 digits
SS          Seconds
SYYYY      Signed 4 digit year; MariaDB only supports positive years
Y          1 digit year
YY          2 digits year
YYY        3 digits year
YYYY        4 digits year

Note that if there is a missing part of the date, the current date is used!
For example if 'MM-DD HH-MM-SS' then the current year will be used.
(Oracle behaviour)

Not supported options:
- BC, D, DL, DS, E, EE, FM, FX, RM, SSSSS, TS, TZD, TZH, TZR, X,SY
  BC is not supported by MariaDB datetime.
- Most of the other are exotic formats does not make sence in MariaDB as
  we return datetime or datetime with fractions, not string.
- D (day-of-week) is not supported as it is not clear exactly how it would
  map to MariaDB. This element depends on the NLS territory of the session.
- RR only works with 2 digit years (In Oracle RR can also work with 4
  digit years in some context but the rules are not clear).

Extensions / differences compared to Oracle;
- MariaDB supports FF (fractional seconds).  If FF[#] is used,
  then TO_DATE will return a datetime with # of subseconds.
  If FF is not used a datetime will be returned.
  There is warning (no error) if string contains more digts than what
  is specified with F(#]
- Names can be shortened to it's unique prefix. For example January and Ja
  works fine.
- No error if the date string is shorter format_string and the next
  not used character is not a number.. This is useful to get a date
  from a mixed set of strings in date or datetime format.
  Oracle gives an error if date string is too short.
- MariaDB supports short locales as language names
- NLS_DATE_FORMAT can use both " and ' for quoting.
- NLS_DATE_FORMAT must be a constant string.
  - This is to ensure that the server knows which locale to use
    when executing the function.

New formats handled by TO_CHAR():
FF[1-6]    Fractional seconds
DDD        Daynumber 1-366
IW          Week 1-53 according to ISO 8601
I          1 digit year according to ISO 8601
IY          2 digit year according to ISO 8601
IYY        3 digit year according to ISO 8601
IYYY        4 digit year according to ISO 8601
SYYY        4 digit year according to ISO 8601 (Oracle can do signed)

Supported NLS_FORMAT_STRING options are:
NLS_CALENDAR=GREGORIAN
NLS_DATE_LANGUAGE=language

Support languages are:
- All MariaDB short locales, like en_AU.
- The following Oracle language names:
ALBANIAN, AMERICAN, ARABIC, BASQUE, BELARUSIAN, BRAZILIAN PORTUGUESE
BULGARIAN, CANADIAN FRENCH, CATALAN, CROATIAN, CYRILLIC SERBIAN CZECH,
DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN,
GREEK, HEBREW, HINDI, HUNGARIAN, ICELANDIC, INDONESIAN ITALIAN,
JAPANESE, KANNADA, KOREAN, LATIN AMERICAN SPANISH, LATVIAN,
LITHUANIAN, MACEDONIAN, MALAY, MEXICAN SPANISH, NORWEGIAN, POLISH,
PORTUGUESE, ROMANIAN, RUSSIAN, SIMPLIFIED CHINESE, SLOVAK, SLOVENIAN,
SPANISH, SWAHILI, SWEDISH, TAMIL, THAI, TRADITIONAL CHINESE, TURKISH,
UKRAINIAN, VIETNAMESE

Development bugs fixed:
MDEV-38403 Server crashes in Item_func_to_date::fix_length_and_dec upon
          using an invalid argument
MDEV-38400 compat/oracle.func_to_date fails with PS protocol and cursor
          protocol (Fixed by Serg)
MDEV-38404 TO_DATE: MTR coverage omissions, round 1
MDEV-38509 TO_DATE: AD_DOT does not appear to be supported
MDEV-38513 TO_DATE: NULL value for format string causes assertion failure
MDEV-38521 TO_DATE: Date strings with non-ASCII symbols cause warnings
          and wrong results
MDEV-38578 TO_DATE: Possibly unexpected results upon wrong input
MDEV-38582 TO_DATE: NLS_DATE_LANGUAGE=JAPANESE does not parse values
          which work in Oracle
MDEV-38584 TO_DATE: NLS_DATE_LANGUAGE=VIETNAMESE does not parse values
          which work in Oracle
MDEV-38703 TO_DATE: Quotation for multi-word NLS_DATE_LANGUAGE leads
          to syntax error in view definition
MDEV-38675 TO_DATE: MSAN/Valgrind/UBSAN errors in
          extract_oracle_date_time

Known issues:
- Format string character matches inside quotes are done
  one-letter-to-one-letter, like in LIKE predicate. That means things
  like expansions and contractions do not work.
  For example 'ss' does not match 'ß' in collations which treat them
  as equal for the comparison operator.
  Match is done taking into account case and accent sensitivity
  of the subject argument collation, so for example this now works:
  MariaDB [test]> SELECT TO_DATE('1920á12','YYYY"a"MM') AS c;
  +---------------------+
  | c                  |
  +---------------------+
  | 1920-12-17 00:00:00 |
  +---------------------+

Co-author and reviewer: Alexander Barkov <[email protected]>
Sergei Golubchik
Merge branch '10.6' into 10.11
Alexander Barkov
MDEV-38698 mysql_upgrade does not fix charset and collation for mysql.user

When the view mysql.user was created (e.g. in 10.6) with an unexpected
character_set_client and collation_connection, e.g. utf8mb3 and utf8mb3_general_ci,
mysql_upgrade did not fix it to the expected latin1 and latin1_swedish_ci.

Fixing mariadb_system_tables.sql to do "CREATE OR REPLACE VIEW user" instead of
"CREATE VIEW IF NOT EXISTS user". This fixes the problem.
Monty
MDEV-38246 aria_read index failed on encrypted database during backup

The backup of encrypted Aria tables was not supported.
Added support for this. One complication is that the page checksum is
for the not encrypted page. To be able to verify the checksum I have to
temporarly decrypt the page.
In the backup we store the encrypted pages.

Other things:
- Fixed some (not critical) memory leaks in mariabackup
Michael Widenius
MDEV-19683 Add support for Oracle TO_DATE()

Syntax:
TO_DATE(string_expression [DEFAULT string_expression ON CONVERSION ERROR],
        format_string [,NLS_FORMAT_STRING])
The format_string has the same format elements as TO_CHAR(), except a
few elements that are not supported/usable for TO_DATE().
TO_DATE() returns a datetime or date value, depending on if the format
element FF is used.

Allowed separators, same as TO_CHAR():
space, tab and any of !#%'()*+,-./:;<=>

'&' can also be used if next character is not a character a-z or A-Z
"text' indicates a text string that is verbatim in the format. One cannot
use " as a separator.

Format elements supported by TO_DATE():
AD          Anno Domini ("in the year of the Lord")
AD_DOT      Anno Domini ("in the year of the Lord")
AM          Meridian indicator (Before midday)
AM_DOT      Meridian indicator (Before midday)
DAY        Name of day
DD          Day (1-31)
DDD        Day of year (1-336)
DY          Abbreviated name of day
FF[1-6]    Fractional seconds
HH          Hour (1-12)
HH12        Hour (1-12)
HH24        Hour (0-23)
MI          Minutes (0-59)
MM          Month (1-12)
MON        Abbreviated name of month
MONTH      Name of Month
PM          Meridian indicator (After midday)
PM_DOT      Meridian indicator (After midday)
RR          20th century dates in the 21st century. 2 digits
            50-99 is assumed from 2000, 0-49 is assumed from 1900.
RRRR        20th century dates in the 21st century. 4 digits
SS          Seconds
SYYYY      Signed 4 digit year; MariaDB only supports positive years
Y          1 digit year
YY          2 digits year
YYY        3 digits year
YYYY        4 digits year

Note that if there is a missing part of the date, the current date is used!
For example if 'MM-DD HH-MM-SS' then the current year will be used.
(Oracle behaviour)

Not supported options:
- BC, D, DL, DS, E, EE, FM, FX, RM, SSSSS, TS, TZD, TZH, TZR, X,SY
  BC is not supported by MariaDB datetime.
- Most of the other are exotic formats does not make sence in MariaDB as
  we return datetime or datetime with fractions, not string.
- D (day-of-week) is not supported as it is not clear exactly how it would
  map to MariaDB. This element depends on the NLS territory of the session.
- RR only works with 2 digit years (In Oracle RR can also work with 4
  digit years in some context but the rules are not clear).

Extensions / differences compared to Oracle;
- MariaDB supports FF (fractional seconds).  If FF[#] is used,
  then TO_DATE will return a datetime with # of subseconds.
  If FF is not used a datetime will be returned.
  There is warning (no error) if string contains more digts than what
  is specified with F(#]
- Names can be shortened to it's unique prefix. For example January and Ja
  works fine.
- No error if the date string is shorter format_string and the next
  not used character is not a number.. This is useful to get a date
  from a mixed set of strings in date or datetime format.
  Oracle gives an error if date string is too short.
- MariaDB supports short locales as language names
- NLS_DATE_FORMAT can use both " and ' for quoting.
- NLS_DATE_FORMAT must be a constant string.
  - This is to ensure that the server knows which locale to use
    when executing the function.

New formats handled by TO_CHAR():
FF[1-6]    Fractional seconds
DDD        Daynumber 1-366
IW          Week 1-53 according to ISO 8601
I          1 digit year according to ISO 8601
IY          2 digit year according to ISO 8601
IYY        3 digit year according to ISO 8601
IYYY        4 digit year according to ISO 8601
SYYY        4 digit year according to ISO 8601 (Oracle can do signed)

Supported NLS_FORMAT_STRING options are:
NLS_CALENDAR=GREGORIAN
NLS_DATE_LANGUAGE=language

Support languages are:
- All MariaDB short locales, like en_AU.
- The following Oracle language names:
ALBANIAN, AMERICAN, ARABIC, BASQUE, BELARUSIAN, BRAZILIAN PORTUGUESE
BULGARIAN, CANADIAN FRENCH, CATALAN, CROATIAN, CYRILLIC SERBIAN CZECH,
DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN,
GREEK, HEBREW, HINDI, HUNGARIAN, ICELANDIC, INDONESIAN ITALIAN,
JAPANESE, KANNADA, KOREAN, LATIN AMERICAN SPANISH, LATVIAN,
LITHUANIAN, MACEDONIAN, MALAY, MEXICAN SPANISH, NORWEGIAN, POLISH,
PORTUGUESE, ROMANIAN, RUSSIAN, SIMPLIFIED CHINESE, SLOVAK, SLOVENIAN,
SPANISH, SWAHILI, SWEDISH, TAMIL, THAI, TRADITIONAL CHINESE, TURKISH,
UKRAINIAN, VIETNAMESE

Development bugs fixed:
MDEV-38403 Server crashes in Item_func_to_date::fix_length_and_dec upon
          using an invalid argument
MDEV-38400 compat/oracle.func_to_date fails with PS protocol and cursor
          protocol (Fixed by Serg)
MDEV-38404 TO_DATE: MTR coverage omissions, round 1
MDEV-38509 TO_DATE: AD_DOT does not appear to be supported
MDEV-38513 TO_DATE: NULL value for format string causes assertion failure
MDEV-38521 TO_DATE: Date strings with non-ASCII symbols cause warnings
          and wrong results
MDEV-38578 TO_DATE: Possibly unexpected results upon wrong input
MDEV-38582 TO_DATE: NLS_DATE_LANGUAGE=JAPANESE does not parse values
          which work in Oracle
MDEV-38584 TO_DATE: NLS_DATE_LANGUAGE=VIETNAMESE does not parse values
          which work in Oracle
MDEV-38703 TO_DATE: Quotation for multi-word NLS_DATE_LANGUAGE leads
          to syntax error in view definition
MDEV-38675 TO_DATE: MSAN/Valgrind/UBSAN errors in
          extract_oracle_date_time

Known issues:
- Format string character matches inside quotes are done
  one-letter-to-one-letter, like in LIKE predicate. That means things
  like expansions and contractions do not work.
  For example 'ss' does not match 'ß' in collations which treat them
  as equal for the comparison operator.
  Match is done taking into account case and accent sensitivity
  of the subject argument collation, so for example this now works:
  MariaDB [test]> SELECT TO_DATE('1920á12','YYYY"a"MM') AS c;
  +---------------------+
  | c                  |
  +---------------------+
  | 1920-12-17 00:00:00 |
  +---------------------+

Co-author and reviewer: Alexander Barkov <[email protected]>
Brandon Nesterenko
MDEV-32570 Prep: Split Rows_log_event::write_data_body()

To prepare for MDEV-32570, the Rows_log_event::write_data_body() is
split into two functions:
1. write_data_body_metadata(), which writes the context of the rows
    data (i.e. width, cols, and cols_ai), which will only be written
    for the first event fragment.
2. write_data_body_rows(), which allows the writing of the rows data
    to be fragmented by parameterizing the writing of the rows data to
    start at a given offset and only write a certain length. This lets
    each row fragment (for MDEV-32570) to contain a chunk of the rows
    data

Signed-off-by: Brandon Nesterenko <[email protected]>
KhaledR57
MDEV-37072: Implement IS JSON predicate

Add support for the SQL standard IS JSON predicate with the syntax:
expr IS [ NOT ] JSON [ { VALUE | ARRAY | OBJECT | SCALAR } ]
[ { WITH | WITHOUT } UNIQUE [ KEYS ] ]

The predicate allows checking if an expression is valid JSON
and optionally constrains the JSON type (VALUE, ARRAY, OBJECT,
SCALAR) and whether object keys are unique.

The implementation includes:
- Basic IS JSON validation
- Support for NOT operator
- Type constraints (VALUE, ARRAY, OBJECT, SCALAR)
- Unique keys constraint (WITH/WITHOUT UNIQUE KEYS)
Yuchen Pei
MDEV-36230 Fix SERVER port field bound check

The Port field in the system table mysql.servers has type INT,
which translates to Field_long.

During parsing it is parsed as ulong_num, and in this patch we add
bound checks there.
Michael Widenius
MDEV-37674: Replace std::string with LEX_CSTRING in Optional_metadata_fields

- Less memory allocations (less fragmentation), less memory usage,
  faster performance, less code (both source and executed).
  - Added option used by RPL_TABLE_LIST::create_column_mapping() to
    only parse and allocate column names.
  - Avoid duplicate memory allocations of column names.
- Some protection for out-of-memory (when allocating field names).
  - More work is needed to remove all allocation problems in
    Optional_metadata_field().
- All allocated strings are ending with \0, which makes code safer.
Oleg Smirnov
MDEV-38129 Match probability
Marko Mäkelä
MDEV-38589: SELECT unnecessarily waits for log write

The design of "binlog group commit" involves carrying some state across
transaction boundaries. This includes trx_t::commit_lsn, which keeps track
of how much write-ahead log needs to be written. Unfortunately, this
field was not reset in a commit where a log write was elided. That would
cause an unnecessary wait in a subsequent read-only transaction that
happened to reuse the same transaction object.

trx_deregister_from_2pc(): Reset trx->commit_lsn so that
an earlier write that was executed in the same client connection
will not result in an unnecessary wait during a subsequent read
operation.

trx_commit_complete_for_mysql(): Unless we are inside a binlog
group commit, reset trx->commit_lsn.

unlock_and_close_files(): Reset trx->commit_lsn after durably
writing the log, and remove a redundant log write call from some
callers.

trx_t::rollback_finish(): Clear commit_lsn, because a rolled-back
transaction will not need to be durably written.

trx_t::clear_and_free(): Wrapper function to suppress a debug check
in trx_t::free().

Also, remove some redundant ut_ad(!trx->will_lock) that will be checked
in trx_t::free().

Reviewed by: Vladislav Vaintroub
Thirunarayanan Balathandayuthapani
MDEV-32067 InnoDB linear read ahead had better be logical

- Ported MDEV-32067 branch into main. Added a few interface
for multi-range read limit and InnoDB also aware of number
of pages to be read. Yet to integrate with buf_read_ahead_pages()
Brandon Nesterenko
MDEV-32570 (client): Fragment ROW replication events larger than slave_max_allowed_packet

This patch extends mysqlbinlog with logic to support the output and
replay of the new Partial_rows_log_events added in the previous
commit. Generally speaking, as the assembly and execution of the
Rows_log_event happens in Partial_rows_log_event::do_apply_event();
there isn’t much logic required other than outputting
Partial_rows_log_event in base64. With two exceptions..

In the original mysqlbinlog code, all row events fit within a single
BINLOG base64 statement; such that the Table_map_log_event sets up
the tables to use, the Row Events open the tables, and then after
the BINLOG statement is run, the tables are closed and the rgi is
destroyed. No matter how many Row Events within a transaction there
are, they are all put into the same BINLOG base64 statement.
However, for the new Partial_rows_log_event, each fragment is split
into its own BINLOG base64 statement (to respect the server’s
configured max_packet_size). The existing logic would close the
tables and destroy the replay context after each BINLOG statement
(i.e. each fragment). This means that 1) Partial_rows_log_events
would be un-able to assemble Rows_log_events because the rgi is
destroyed between events, and 2) multiple re-assembled
Rows_log_events could not be executed because the context set-up by
the Table_map_log_event is cleared after the first Rows_log_event
executes.

To fix the first problem, where we couldn’t re-assemble
Rows_log_events because the rgi would disappear between
Partial_rows_log_events, the server will not destroy the rgi when
ingesting BINLOG statements containing Partial_rows_log_events that
have not yet assembled their Rows_log_event.

To fix the second problem, where the context set-up by the
Table_map_log_event is cleared after the first assembled
Rows_log_event executes, mysqlbinlog caches the Table_map_log_event
to re-write for each fragmented Rows_log_event at the start of the
last fragment’s BINLOG statement. In effect, this will re-execute
the Table_map_log_event for each assembled Rows_log_event.

Reviewed-by: Hemant Dangi <[email protected]>
Acked-by: Kristian Nielsen <[email protected]>
Signed-off-by: Brandon Nesterenko <[email protected]>
Sergei Petrunia
MDEV-36055 Allow left join reordering

(Based on a patch by Yuchen Pei)
When computing table dependencies in simplify_joins(), do not make
LEFT JOIN-ed table (or join nest) to be dependent on all preceding tables.

For queries in form

  t1
  LEFT JOIN t2 ON t2.col=t1.col
  LEFT JOIN t3 ON t3.col=t1.col

this allows the optimizer to construct join order of t1-t3-t2. Before
this patch, t1-t2-t3 was the only possible order.

Note that queries that use Oracle's Outer Join syntax (col1(+)=col2) are
converted into chained independent LEFT JOINs like the above.

The optimization is controlled by optimizer_switch='reorder_outer_joins=ON'

It is NOT enabled by default as it is known to expose problems with
join pruning. It is advised to set optimizer_prune_level=0 when setting
reorder_outer_joins=on.

Co-authored-by: Yuchen Pei <[email protected]>
Monty
MDEV-38435 Add Gtid_binlog_pos to SHOW MASTER STATUS

Other things:
- Extended mysqltest to write GTID's for master and slave if
  sync_slave_with_master fails.

Reviewer: Brandon Nesterenko <[email protected]>
Sergei Golubchik
Merge branch '10.11' into 11.4
Brandon Nesterenko
MDEV-32570 Prep: Split read_log_event into non-checksum version

Preparation for MDEV-32570. When fragmenting a large row event into
multiple smaller fragment events, each fragment event will have its own
checksum attached, thereby negating the need to also store the checksum
of the overall large event.

The existing code assumes all events will always have checksums, but
this won't be true for the rows events that are re-assembled on the
replicas. This patch prepares for this by splitting the logic which
reads in and creates Log_event objects into two pieces, one which
handles the checksum validation; and the other which reads the raw
event data (without the checksum) and creates the object.

All existing code is unchanged which uses the checksum-assuming version
of the event reader. MDEV-32570 will be the only case which will bypass
the checksum logic, and will directly create its rows log events from
memory without validating checksums (as the checksums will have already
been validated by each individual fragment event).

Signed-off-by: Brandon Nesterenko <[email protected]>
Otto Kekäläinen
Promote getting GitHub stars in server log and client prompt

Ask users to give MariaDB a star by having an extra line in the MariaDB
client prompt:

    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 33
    Server version: 12.2.0-MariaDB-1:12.2.0 mariadb.org binary distribution

    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

    Help others discover MariaDB. Star it on GitHub: https://github.com/MariaDB/server
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    MariaDB [(none)]>

Additionally, have this extra line in server logs:

    [Note] Help others discover MariaDB. Star it on GitHub: https://github.com/MariaDB/server

This change is done in a way that it is easy to cherry-pick to older
releases, and the text can later be changed to promote something else.

Test file updated with:
nano --noconvert --nonewlines mysql-test/main/mysql-interactive.result
Brandon Nesterenko
MDEV-36290: Fix optional_metadata_len type mismatch

Parameter optional_meteradata_len to Optional_metadata_fields should
have been a size_t rather than uint.

Signed-off-by: Brandon Nesterenko <[email protected]>
Kristian Nielsen
Binlog-in-engine: Fix uninitialized function parameters

Don't pass uninitialized values in function call. MSAN complains about this
(even when the called function never accesses the uninitialized values, and
even when the function is constexpr .oO).

Signed-off-by: Kristian Nielsen <[email protected]>
Brandon Nesterenko
MDEV-32570: Update perf_schema PFS_MAX_STAGE_CLASS

The new stages added by MDEV-32570, stage_buffer_partial_rows and
stage_constructing_rows_ev, pushed the number of instrumented stages
beyond PFS_MAX_STAGE_CLASS.

This increases PFS_MAX_STAGE_CLASS by 10 to 170.

Acked-by: Sergei Golubchik <[email protected]>
Signed-off-by: Brandon Nesterenko <[email protected]>
Thirunarayanan Balathandayuthapani
MDEV-32067 InnoDB linear read ahead had better be logical

- Ported MDEV-32067 branch into main. Added a few interface
for multi-range read limit and InnoDB also aware of number
of pages to be read. Yet to integrate with buf_read_ahead_pages()
Thirunarayanan Balathandayuthapani
MDEV-32067 InnoDB linear read ahead had better be logical

- Ported MDEV-32067 branch into main. Added a few interface
for multi-range read limit and InnoDB also aware of number
of pages to be read. Yet to integrate with buf_read_ahead_pages()
Brandon Nesterenko
MDEV-32570 (server): Fragment ROW replication events larger than slave_max_allowed_packet

This patch solves two problems:
  1. Rows log events cannot be transmitted to the slave if their
    size exceeds slave_max_packet_size (max 1GB at the time of
    writing this patch, i.e. MariaDB 12.3)
  2. Rows log events cannot be binlogged if they are larger than
    4GB because the common binlog event header field event_len is
    32-bits.

This patch adds support for fragmenting large Rows_log_events
through a new event type, Partial_rows_log_event. When any given
instantiation of a Rows_log_event (e.g. Write_rows_log_event, etc)
is too large to be sent to a replica (i.e. larger than the value
slave_max_allowed_packet, as configured on a replica), then the rows
event must be fragmented into sub-events (i.e.
Partial_rows_log_events), so the event can be transmitted to the
replica. The replica will then take the content of each of these
Partial_rows_log_events, and join them together into a large
Rows_log_event to be executed as normal.  Partial_rows_log_events
are written to the binary log sequentially, and the replica assembles
the events in the order they are binlogged.

To control the size of each Partial_rows_log_event, a new system
variable is added: binlog_row_event_fragment_threshold. All
Partial_rows_log_events within the same group will have this size,
except for the last, which will take up the remaining length of the
underlying Rows_log_event.

Each Partial_rows_log_event stores its sequence number (seq_no) in the
overall series of fragments, the total number of fragments needed to
re-assemble the Rows_log_event (total_fragments), a uchar for flags for
embedding extra data, and any additional data as specified by the
flags. Currently, only the first event in a grouping will have
additional data: it will set the first bit in the flags
(FL_ORIG_EVENT_SIZE) to indicate it will be storing the total size of
the underlying Rows_log_event.

The cached Rows_log_event data is fragmented into
Partial_rows_log_events as follows. The primary will still generate a
Rows_log_event to write to the binlog; however, during the actual
writing process, the raw data of the rows event is split into
fragments, each covering some continuous section of the rows data. A
Partial_rows_log_event is created for each continuous section, and the
Partial_rows_log_events are written sequentially in-place of the
too-large Rows_log_event. The original data to be fragmented will
include a header and data header; however, will not include a checksum,
as each Partial_rows_log_event will have a checksum for validation, as
well as a sequence_number and total number of fragments to ensure all
fragments are present.

The re-assembly and execution of the original Rows_log_event on the
replica happens in Partial_rows_log_event::do_apply_event(). The rgi
is extended with a memory buffer that holds all data for the
original Rows_log_event. As each Partial_rows_log_event is
ingested, its Rows_log_event content is appended to this memory
buffer. Once the last fragment has added its content, a new
Rows_log_event is created using that buffer, and executed.

A new error message is added to indicate that the slave has received an
invalid stream of Partial_rows_log_events:
ER_PARTIAL_ROWS_LOG_EVENT_BAD_STREAM.

Note this commit only adds the server logic for fragmentic and
assembling events, the client logic (mysqlbinlog) is in the next
commit.

Alternative designs considered were:
  1. Alternative 1: Change the master-slave communication protocol
    such that the master would send events in chunks of size
    slave_max_allowed_packet. Though this is still a valid idea,
    and would solve the first problem described in this commit
    message, this would still leave the limitation that
    Rows_log_events could not exceed 4GB. Eventually, this change
    should still be addressed (e.g. in MDEV-37853), and for users
    of binlog-in-engine (MDEV-34705) which supports out-of-band
    binlogging of events, this MDEV-32570 work will be superseded.
  2. Alternative 2: Create a generic “Container_log_event” with the
    intention to embed various other types of event data for
    various purposes, with flags that describe the purpose of a
    given container. This seemed overboard, as there is already a
    generic Log_event framework that provides the necessary
    abstractions to fragment/reassemble events without adding in
    extra abstractions.
  3. Alternative 3: Add a flag to Rows_log_event with semantics to
    overwrite/correct the event_len field of the common event
    header to use a 64-bit field stored in the data_header of the
    Rows_log_event; and also do alternative 1, so the master would
    send the large (> 4GB) rows event in chunks. This approach
    would add too much complexity (changing both the binlogging
    and transport layer); as well as introduce inconsistency to
    the event definition (event_len and next_event_position would
    no longer have consistent meanings).

Reviewed-by: Hemant Dangi <[email protected]>
Acked-by: Kristian Nielsen <[email protected]>
Signed-off-by: Brandon Nesterenko <[email protected]>
Marko Mäkelä
First steps towards multi-file parsing

log/log0recv.cc:3578: ut_ad(l + log_sys.is_encrypted() * 8 + 5 == el)
Kristian Nielsen
Binlog-in-engine: Fix sporadic test failure of binlog_in_engine.purge_locking

The test was for some reason incorrectly doing SHOW BINLOG EVENTS when the
binlogging of the prior event is deliberately non-deterministic in which
binlog file it will appear in, causing test to depend on timing.

Signed-off-by: Kristian Nielsen <[email protected]>