Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Categories: connectors experimental galera main
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

connectors experimental galera main
Jan Lindström
MDEV-39488 : Skip Galera test requiring perfschema if -DPLUGIN_PERFSCHEMA=NO

Test change only, this test does not need perfschema.
Monty
fixup! 13e1fdfb6d3deb492f81e07caacadc2c4fa75dfb

Fixed duplicate key error when converting HEAP table Aria
Arcadiy Ivanov
Clarify comments for HEAP blob continuation and tmp table overflow

- `item_sum.cc`: fix misleading "HEAP table full" comment — the error
  type is unknown at this point; `create_internal_tmp_table_from_heap()`
  determines whether it is a convertible HEAP overflow or a fatal error
- `sql_select.cc`: document why `choose_engine()` re-checks key limits
  after picking a disk engine for non-key-limit reasons
- `heapdef.h`: add descriptive comments to `HP_CONT_MIN_RUN_BYTES`,
  `HP_CONT_RUN_FRACTION_NUM`, `HP_CONT_RUN_FRACTION_DEN`
Arcadiy Ivanov
Skip unchanged blobs in `heap_update()`

Per-column blob change detection in `heap_update()`: compare each
blob column's old and new values before rewriting continuation chains.
Detection order (cheapest first): length comparison (O(1)), data
pointer comparison (O(1)), `memcmp` fallback (O(n) with early exit).

Unchanged blobs keep their existing chains with no allocation, copy,
or free. Only changed blobs get new chains written (write-before-free
for crash safety) and old chains freed. This avoids unnecessary chain
churn for common patterns like `UPDATE t SET non_blob_col = x`,
`INSERT ... ON DUPLICATE KEY UPDATE` with unchanged blob values, and
`REPLACE` with identical blob data.

`hp_blob_length()` and `hp_write_one_blob()` made non-static for use
by `heap_update()` directly (bypassing `hp_write_blobs()` which always
rewrites all columns).
Arcadiy Ivanov
Fix MSAN crash in `hp_rec_hashnr` for geometry/blob DISTINCT keys

`heap_prepare_hp_create_info()` mishandled blob key segments whose
field type is `Field_blob` or `Field_geom` (as opposed to
`Field_blob_key`).

`Field_blob::key_type()` returns `HA_KEYTYPE_VARBINARY2` (geometry)
or `HA_KEYTYPE_VARTEXT2` (text blobs).  Commit `30415846402`
("Introduce Field_blob_key") added logic that stripped `HA_BLOB_PART`
from these segments, assuming they use "VARCHAR packing."  This is
wrong: DISTINCT/UNION key fields are `Field_blob` (not
`Field_blob_key`), and their record format is a blob descriptor
(`packlength` bytes of length + 8-byte data pointer), not a varchar
(2-byte length prefix + inline data).

After `HA_BLOB_PART` was stripped, `hp_create.c` normalized
`VARBINARY2` → `VARTEXT1`.  The hash function `hp_rec_hashnr()` then
entered the `VARTEXT1` branch, which reads the first 2 bytes as a
varchar length and hashes that many bytes starting at offset 2.  For a
geometry blob descriptor, the first 2 bytes are the low bytes of the
WKB data length (e.g. ~100 for a simple polygon), so the hash read
~100 bytes starting inside the 12-byte descriptor — overshooting into
adjacent fields or uninitialized record buffer memory.

This caused MSAN "use-of-uninitialized-value" crashes in
`innodb_gis.1`, `innodb_gis.point_basic`, `main.gis`, and
`innodb_gis.gis` on the `amd64-msan-clang-20` CI builder.  Beyond the
MSAN crash, it was also a functional bug: hashing the raw pointer
bytes meant two rows with identical geometry data but different memory
addresses would hash differently, breaking UNION DISTINCT
deduplication.

**Fix**: when `HA_BLOB_PART` is set and the key type is
`VARBINARY2`/`VARTEXT2`, promote to `VARBINARY4`/`VARTEXT4` instead
of stripping the flag.  Set `bit_start` to the actual `packlength`
and `length` to `4 + sizeof(pointer)`.  `hp_create.c` then normalizes
to `VARTEXT4` and the hash/compare functions use the blob path:
dereference the pointer and operate on the actual data.
Arcadiy Ivanov
Add stress test for HEAP blob insert/delete/update cycles

3-phase `heap/blob_stress` MTR test exercising free-list
fragmentation, continuation chain reuse, and data integrity under
sustained mixed DML:

- Phase 1: 200-cycle stored procedure — 5 inserts, 2 deletes,
  3 updates per cycle with blob sizes cycling through Case A/B/C;
  shadow table row count verification with `SIGNAL` on mismatch
- Phase 2: near-capacity (2 MB) fill/fragment/refill with free-list
  scavenging, then full-delete reinsert
- Phase 3: 20 grow/shrink `UPDATE` cycles (even rows 10-18 KB,
  odd rows 5-20 bytes)

All phases verify blob content integrity via single-character
`REPEAT` pattern check. Addresses Monty's F14 feedback.
Arcadiy Ivanov
Disable `ps_protocol` in `blob_big` tests

`SHOW STATUS LIKE 'Created_tmp%'` counts include the extra temp table
created by prepared statement re-execution under `--ps-protocol`. The
test already disabled `cursor_protocol` and `ps2_protocol` but missed
`ps_protocol`, causing `blob_big1`/`blob_big2`/`blob_big3` to fail on
CI builders that run with `--ps-protocol`.
Arcadiy Ivanov
Overflow-to-Aria on `ha_update_tmp_row()` for GROUP BY temp tables

When `ha_update_tmp_row()` fails with `HA_ERR_RECORD_FILE_FULL` on a
HEAP temp table (e.g., `MAX(TEXT)` aggregate growing the blob during
GROUP BY accumulation), convert the table to Aria and retry the update
— matching the existing INSERT overflow handling.

**Mechanism**: `create_internal_tmp_table_from_heap()` copies all rows;
`record[0]` write is rejected as duplicate (same GROUP BY key).
`get_dup_key()` populates `dup_ref`, then `ha_rnd_pos()` locates the
old row in Aria for the update.

**Two call sites fixed**:
- `end_update()`: switches to `end_unique_update` after conversion
- `end_unique_update()`: restores INDEX mode via `rnd_inited` flag

`GROUP_CONCAT` does NOT trigger this path — its `update_field()` is
`DBUG_ASSERT(0)` (it accumulates internally). `MAX(TEXT)` / `MIN(TEXT)`
are the aggregates that write growing blobs via `result_field->store()`.
Oleksandr Byelkin
Merge branch '11.8' into 12.3
Monty
MDEV-39095 MariaDB server syntax checker

Adds option --check-syntax to mariadbd server.

This allows one to check if mariadbd supports some particular syntax.

Example usage:
cat file-with-sql-syntax | mariadbd [--no-defaults] --check-syntax
Arcadiy Ivanov
Skip run header for single-record blob continuation runs

When a blob fits entirely within a single continuation record
(`data_len <= visible`), skip the 10-byte run header (`next_cont`
pointer + `run_rec_count`) and store data starting at offset 0.
This reclaims 10 bytes of payload per small blob, which matters
for tables with small `recbuffer` (e.g. 16 bytes: payload increases
from 5 to 15 bytes, avoiding a second record for blobs up to 15 bytes).

**`HP_ROW_SINGLE_REC` flag** (bit 4 in the flags byte) signals that
the continuation record has no run header. The reader gets `visible`
bytes of contiguous data starting at the chain pointer (zero-copy).

**`enum hp_blob_format`** replaces ad-hoc boolean/flag checks with
a single vocabulary for blob storage format detection:
- `HP_BLOB_CASE_A_SINGLE_REC`: no header, data at offset 0 (new)
- `HP_BLOB_CASE_B_ZEROCOPY`: header in rec 0, data in rec 1..N-1
- `HP_BLOB_CASE_C_MULTI_RUN`: header + data in each run, linked

**`hp_blob_run_format()`** is the single decoder used by all paths:
write (`hp_write_run_data`), read (`hp_materialize_blobs`,
`hp_materialize_one_blob`), free (`hp_free_run_chain`), scan
(`heap_scan`), and integrity check (`heap_check_heap`).

Files changed:
- `storage/heap/heapdef.h`: flag, enum, decoder function
- `storage/heap/hp_blob.c`: write/read/free paths
- `storage/heap/hp_scan.c`: scan skip logic
- `storage/heap/_check.c`: integrity check
Arcadiy Ivanov
Replace `hp_blob_run_format()` enum with direct bit testing

Remove `enum hp_blob_format` and `hp_blob_run_format()` indirection.
Add `HP_ROW_MULTIPLE_REC` (bit 5) so all three blob storage formats
have a dedicated flag bit. Add named inline predicates
`hp_is_single_rec()`, `hp_is_zerocopy()`, `hp_is_multi_run()` matching
the existing `hp_is_active()`/`hp_has_cont()`/`hp_is_cont()` pattern.

Change `hp_write_run_data()` format parameter from enum to `uchar`
receiving bit constants directly; simplify flags byte assignment from
ternary to bitwise OR.

Addresses review feedback F127-F128, F130, F132-F134.
Monty
Fixed that Field_blob_compressed can be used in internal temporary tables

group_concat need special code in store() for storing the result
in table->blob_storage. This was implemented in Field_blob::store()
but not in Field_blob_compressed::store. This was temporarly solved
by not using Field_blob_compressed table->s->is_optimizer_tmp_table()
would be set. This this however disable Field_blob_compressed for
temporary tables that did not need a key for the blob field.

Fixed by ensuring that Field_blob::store and Field_blob_compressed::store
handles group_concat identically. As this handling is only done for
internal temporary tables, we store the data uncompressed for faster
usage by group_concat()
Arcadiy Ivanov
Avoid double blob materialization in `find_unique_row()`

For blob tables, `find_unique_row()` previously materialized blobs
twice: once during `hp_rec_key_cmp()` (per-segment via
`hp_materialize_one_blob()`) and again via `hp_read_blobs()` after
the match was found.

Reorder the blob path to materialize-then-compare: save the input
record, copy the stored candidate, call `hp_read_blobs()` once to
materialize all blobs, then compare via `hp_rec_key_cmp()` with
`info=NULL` since both records now have direct data pointers.

Non-blob tables keep the original fast path unchanged.
Arcadiy Ivanov
MDEV-38975: HEAP engine BLOB/TEXT/JSON/GEOMETRY column support

Allow BLOB/TEXT/JSON/GEOMETRY columns in MEMORY (HEAP) engine tables
by storing blob data in variable-length continuation record chains
within the existing `HP_BLOCK` structure.

**Continuation runs**: blob data is split across contiguous sequences
of `recbuffer`-sized records. Each run stores a 10-byte header
(`next_cont` pointer + `run_rec_count`) in the first record; inner
records (rec 1..N-1) have no flags byte — full `recbuffer` payload.
Runs are linked via `next_cont` pointers. Individual runs are capped
at 65,535 records (`uint16` format limit); larger blobs are
automatically split into multiple runs.

**Zero-copy reads**: single-run blobs return pointers directly into
`HP_BLOCK` records, avoiding `blob_buff` reassembly entirely:
- Case A (`run_rec_count == 1`): return `chain + HP_CONT_HEADER_SIZE`
- Case B (`HP_ROW_CONT_ZEROCOPY` flag): return `chain + recbuffer`
- Case C (multi-run): walk chain, reassemble into `blob_buff`
`HP_INFO::has_zerocopy_blobs` tracks zero-copy state; used by
`heap_update()` to refresh the caller's record buffer after freeing
old chains, preventing dangling pointers.

**Free list scavenging**: on insert, the free list is walked read-only
(peek) tracking contiguous groups in descending address order (LIFO).
Qualifying groups (>= `min_run_records`) are unlinked and used. The
first non-qualifying group terminates the scan — remaining data is
allocated from the block tail. The free list is never disturbed when
no qualifying group is found.

**Record counting**: new `HP_SHARE::total_records` tracks all physical
records (primary + continuation). `HP_SHARE::records` remains logical
(primary-only) to preserve linear hash bucket mapping correctness.

**Scan/check batch-skip**: `heap_scan()` and `heap_check_heap()` read
`run_rec_count` from rec 0 and skip entire continuation runs at once.

**Hash functions**: `hp_rec_hashnr()`, `hp_rec_key_cmp()`, `hp_key_cmp()`,
`hp_make_key()` updated to handle `HA_BLOB_PART` key segments — reading
actual blob data via pointer dereference or chain materialization.

**SQL layer**: `choose_engine()` no longer rejects HEAP for blob tables
(replaced `blob_fields` check with `reclength > HA_MAX_REC_LENGTH`).
`remove_duplicates()` routes HEAP+blob to `remove_dup_with_compare()`.
`ha_heap::remember_rnd_pos()` / `restart_rnd_next()` implemented for
DISTINCT deduplication support. Fixed undefined behavior in
`test_if_cheaper_ordering()` where `select_limit/fanout` could overflow
to infinity — capped at `HA_POS_ERROR`.

https://jira.mariadb.org/browse/MDEV-38975
Arcadiy Ivanov
Batch tail allocation for blob continuation chains

`hp_alloc_from_tail()` now takes `uint *blocks` (in/out) and allocates
a contiguous batch of records from the current leaf block in one call,
replacing the per-record inner loop in `hp_write_one_blob()` Step 2.

The caller pre-computes the record count needed for the chosen storage
format (Case B for `is_only_run`, Case C otherwise), and the function
returns however many are available up to the request. The flat
if/else-if/else then selects Case A, B, or C based on the actual count.

This eliminates the record-by-record extension loop, both contiguity
guards with `abort()`, and the Case B extra-record allocation logic,
reducing Step 2 from ~170 lines to ~60.
Arcadiy Ivanov
Reclaim tail records on failed blob allocation

When `hp_write_one_blob()` fails partway through tail allocation
(e.g. `HA_ERR_RECORD_FILE_FULL`), `hp_free_run_chain()` puts the
partial chain onto the delete list. These records were just
tail-allocated but once on the delete list they could only be reused
via free-list scavenging, not tail allocation, so `last_allocated`
only grew forward.

Add `hp_shrink_tail()` which pops tail-positioned records from the
delete list head and decrements `last_allocated`. Crosses block
boundaries by locating the previous leaf block via `hp_find_block()`
and updating `last_blocks`. Empty blocks stay allocated in the tree
(freed at table drop).

Add `high_water_allocated` to `HP_BLOCK` to track the peak
`last_allocated` before shrinking. In `hp_alloc_from_tail()`, when
`block_pos == 0` and `last_allocated < high_water_allocated`, the
next leaf block already exists in the tree: reuse it via
`hp_find_block()` instead of calling `hp_get_new_block()`, avoiding
memory waste from duplicate block allocations.

Unit tests (41 new assertions in `hp_test_freelist-t.c`):
- Single-block tail reclaim after failed blob insert
- Cross-block reclaim (2 blocks, 1 boundary crossing)
- 3-block reclaim (2 boundary crossings, `last_blocks` restoration)
- Orphaned block reuse: non-blob and blob inserts fill reclaimed
  blocks without growing `data_length`
Raghunandan Bhat
MDEV-39095: Fixes to MariaDB Syntax Checker

- Skip initializing stopwords for full-text indicies.
- Skip changing data directory when checking syntax.
- Help message when syntax checker is used in interactive mode.
- Initialize THD with a dummy database name to prevent `No database
  selected` error.
- Recognize executable comments of the form- /*! ... */ and /*!M ... */
  and errors.
- More tests.
Jan Lindström
MDEV-39561 : Galera test failure on mysql-wsrep-features#8

Test case changes only. Fix wait_conditions.
Daniel Black
MDEV-28374 UBSAN signed integer overflow PROCEDURE ANALYSE

PROCEDURE ANALYSE returns a Std (Standard deviation) which involves the
sum of squares, which can exceed the longlong datatypes.

Adjust the sum_sqr and sum value to double to acocunt that its only used
in a double context and precision isn't required.

Reviewed by: Alexander Barkov
Arcadiy Ivanov
Fix PAD SPACE blob comparison and add blob key tests

**Bug fix** (`padspace_early_exit`): `hp_rec_key_cmp()` and `hp_key_cmp()`
in `hp_hash.c` had early-exit checks `if (len1 != len2) return 1` for
blob key segments, which broke PAD SPACE collations (the default). With
PAD SPACE, `'abc'` (len=3) and `'abc  '` (len=6) must compare equal
because trailing spaces are insignificant, but the length check rejected
them before reaching `strnncollsp()`.

Fix: only short-circuit on length mismatch for NO PAD collations
(`MY_CS_NOPAD`). This bug was discovered during the VARCHAR-to-BLOB
promotion work (Phase 1) and affects any HEAP table with blob key
segments, manifesting in `COUNT(DISTINCT)` on TEXT columns returning
inflated counts.

**Test coverage** transferred from Phase 1:

- `heap.heap_blob_ops` MTR test: exercises HEAP internal temp tables
  with explicit TEXT columns (GROUP BY, DISTINCT, IN-subquery, CTEs,
  window functions, ROLLUP). Includes a targeted PAD SPACE scenario
  that catches `padspace_early_exit`.

- `hp_test_hash-t` unit test (43 tests): validates blob hash/compare
  functions including PAD SPACE collation, NULL/empty blobs,
  multi-segment keys, key format round-trips.

- `hp_test_key_setup-t` unit test (9 tests): validates
  `heap_prepare_hp_create_info()` handling of blob key segments
  (`distinct_key_truncation`) and garbage `key_part_flag`
  (`garbage_key_part_flag`). Four Phase 1-specific assertions are
  deferred via `#if 0` (these bugs are compensated by `hp_create.c`
  runtime normalization in MDEV-38975 proper but will be needed
  when Phase 1 removes that safety net).
Arcadiy Ivanov
Set `key_part_flag` from field type in GROUP BY key setup

Rebuild HEAP index key from `record[0]` when the index has blob key
segments, because `Field_blob::new_key_field()` returns `Field_varstring`
(2B length + inline data) while HEAP's `hp_hashnr`/`hp_key_cmp` expect
`hp_make_key` format (4B length + data pointer).

Precompute `HP_KEYDEF::has_blob_seg` flag during table creation to avoid
per-call loop through key segments.
Monty
Remove pack_length_no_ptr()

Replaced pack_length_no_ptr() with existing length_size() that does the
same thing.
Monty
squash! 4888cdb69fd115986625da2a44b78a6a3898983c

Fixed that heap_prepare_hp_create_info() honors tmp_memory_table_size
Fixed memory overrun error in hp_test_hash-t
Added DBUG_ASSERT to ensure that we are not used converted heap
keys with index_read()
Monty
Removed duplicate versions of Field::row_pack_length()
Monty
Introduce Field_blob_key for handling keys on blobs for temporary tables

Field_blob_key is a new blob variant stored as [4-byte
length][data pointer] that can be used as a sort/distinct key in
optimizer temporary tables. Previously, plain Field_blob was used in
GROUP_CONCAT and UNION DISTINCT contexts, which could not be
properly compared as a key.

Field_blob_key allows removing of blob key re-packing in heap
introduced by HEAP GROUP BY / DISTINCT on TEXT/BLOB columns

Implementation:

Pass a new Tmp_field_param argument through all make_new_field() and
create_tmp_field() overrides so that field creation can know whether
the resulting field will be part of a unique/distinct key. This
partly replaces the earlier overloading of TABLE::group_concat.

Other things:
- Fix Field_blob_compressed::make_new_field() to correctly handle two
  distinct cases:
  - When part of a unique/distinct key: substitute with Field_blob_key so
  key comparisons work correctly.
  - When placed in any optimizer tmp table (e.g. GROUP_CONCAT with ORDER
  BY): substitute with a plain uncompressed Field_blob, fixing wrong
  results caused by the compressed field's internal value buffer being
  overwritten across rows.
- Fix UNIQUE_KEY_FLAG in the client protocol so it is also set for
  columns that are part of a UNION DISTINCT, not only for columns from
  unique indexes.
- Mark internal temporary tables created by create_tmp_table /
  Create_tmp_table with type RESULT_TMP_TABLE instead of
  INTERNAL_TMP_TABLE. This makes it easier to differentiate between
  temporary tables created as placeholder for normal tables, like
  in CREATE .. SELECT and ALTER TABLE, derived tables.
Oleksandr Byelkin
MDEV-39389 Memory leaks in _db_set_init

Handle forgotten case of resetting parameters
(free command_line before assigning a new one).
Arcadiy Ivanov
Code review feedback: `hp_update.c` cleanup, test renames, style fixes

Apply Monty's review feedback across HEAP blob implementation:

**`hp_update.c`:**
- Hoist `HP_BLOB_DESC *desc` to block scope, use `desc++` in all three
  blob loops instead of re-indexing `&share->blob_descs[i]` each iteration
- Move `new_len` declaration to block scope, remove inner `{ }` wrapper
  block, dedent the write-new-chains loop body by one level
- Replace `if (blob_changed[i]) any_changed= TRUE` with branchless
  `any_changed|= blob_changed[i]`
- Add braces to rollback `for (j= 0; j < i; j++)` loop body
- Update chain pointer restoration comment to explain `pos` vs `old`
  pointer semantics for segmented blobs
- Rename inner loop `new_len` to `cur_len` to avoid shadowing block-scope
  `new_len`

**`read_lowendian()` move (F33/F63):**
- Move `read_lowendian()` from `sql/field.h` to `include/my_base.h` so
  pure-C storage engines can use it
- Convert `hp_blob_length()` from standalone function in `hp_blob.c` to
  `static inline` wrapper in `heapdef.h` calling `read_lowendian()`
- Convert `hp_blob_key_length()` in `hp_hash.c` to `static inline`
  wrapper calling `read_lowendian()`

**Test renames** (Monty's naming convention — drop `heap_` prefix):
- `heap_blob.test` → `blob.test`
- `heap_blob_big{1,2,3}.test` → `blob_big{1,2,3}.test`
- `heap_blob_big.inc` → `blob_big.inc`
- `heap_blob_groupby.test` → `blob_group_by.test`
- `heap_blob_ops.test` → `blob_ops.test`

**Other files:** Style fixes from earlier feedback items applied across
`hp_blob.c`, `hp_write.c`, `hp_hash.c`, `hp_scan.c`, `hp_delete.c`,
`ha_heap.cc`, `heapdef.h`, `_check.c`, `heap.h`, `field.h`,
`sql_select.cc`, and test files.
forkfun
MDEV-24557 mariadb-dump: translate MySQL 8.x user/grant syntax for MariaDB import

  `mariadb-dump --system=users` can capture a MySQL 8.0+ source server
  and produce a dump that replays cleanly on MariaDB. In the dump file
  such statements appear twice, wrapped in `/*!80001 ... */` (MySQL-only)
  and `/*M!100005 ... */`(MariaDB-only)

  Tests:

  - main.mariadb-dump-mysql8-import
  - main.mariadb-dump-roles-regression
Arcadiy Ivanov
Early FULLTEXT detection for derived table engine choice

When a derived table is used in a query with FULLTEXT functions,
detect this in `mysql_derived_prepare()` and force a disk-based
tmp engine (`TMP_TABLE_FORCE_MYISAM`) before the result table is
created. This avoids creating a HEAP handler and then swapping it
for Aria/MyISAM later in `Item_func_match::fix_fields()`.

The check uses `derived->select_lex->ftfunc_list->elements` to
detect FULLTEXT in the outer query, following the same approach as
`st_select_lex_unit::prepare()` in `sql_union.cc`.

The handler swap block in `Item_func_match::fix_fields()` is
replaced with a simple `ER_TABLE_CANT_HANDLE_FT` error, which now
serves only as a safety net for engines that genuinely lack
FULLTEXT support.
Jan Lindström
MDEV-39488 : Skip Galera test requiring perfschema if -DPLUGIN_PERFSCHEMA=NO
Arcadiy Ivanov
Consistent `hp_rec_key_cmp()` argument order in `heap_update()`

Swap rec1/rec2 arguments to match the API convention: rec1 = input
record (direct data pointers), rec2 = potentially stored record
(chain pointers when info != NULL). Both calls pass info=NULL so the
swap is a no-op for behavior, but makes the argument order consistent
with all other call sites (`hp_write.c`, `hp_delete.c`, `ha_heap.cc`).
Arcadiy Ivanov
Add `DBUG_ASSERT` guards for MSAN regression fixes

`Field_geom::store()`: assert `blob_storage` is not set, catching
any future removal of the MDEV-16699 `group_concat` downgrade in
`Field_blob::make_new_field()`.

`heap_prepare_hp_create_info()`: after `HA_BLOB_PART` promotion,
assert key type is `VARTEXT4`/`VARBINARY4` and `bit_start` is 1-4,
catching blob key segments that were not promoted from `VARTEXT2`.
Arcadiy Ivanov
Free-list scavenge fallback + contiguity fix for blob allocation

Two fixes in `hp_write_one_blob()`:

**Bug fix**: Step 1 free-list contiguity detection failed to update
`prev_pos` inside the contiguity branch, so the check
`pos == prev_pos - recbuffer` could only detect 2-record groups.
The third record was always compared against the original `prev_pos`
(2 recbuffers away), causing a false discontinuity. Fix: add
`prev_pos = pos` after `run_start = pos`.

**Deficiency #2**: When Step 2 (tail allocation) fails with
`HA_ERR_RECORD_FILE_FULL` and there are still deleted records on the
free list, a new Step 3 walks the entire free list accepting any
contiguous group (even single slots). Each group is written as a
Case C run via `hp_unlink_and_write_run()`. This produces maximally
fragmented chains, which are slower to read but correct. Failing with
table-full when free slots exist is worse than a fragmented chain.

Tests:
- `hp_test_freelist-t.c`: 38 unit tests covering contiguity detection
  (prev_pos bug guard), repeated delete-reinsert cycles, Step 3
  scavenge fallback, and true capacity exhaustion
- `heap/blob_fallback.test`: MTR test exercising the fallback at SQL
  level with fragmented free list
- Extracted shared `hp_test_helpers.h` from duplicate code in
  `hp_test_scan-t.c` and `hp_test_freelist-t.c`
Arcadiy Ivanov
MDEV-38975: Add hash pre-check to skip expensive blob materialization in hash chain traversal

`hp_search()`, `hp_search_next()`, `hp_delete_key()`, and
`find_unique_row()` walk hash chains calling `hp_key_cmp()` or
`hp_rec_key_cmp()` for every entry. For blob key segments, each
comparison triggers `hp_materialize_one_blob()` which reassembles
blob data from continuation chain records.

Since each `HASH_INFO` already stores `hash_of_key`, compare it
against the search key's hash before the full key comparison. When
hashes differ the keys are guaranteed different, skipping the
expensive materialization. This pattern already existed in
`hp_write_key()` for duplicate detection but was missing from the
four read/delete paths.

`HP_INFO::last_hash_of_key` is added so `hp_search_next()` can
reuse the hash computed by `hp_search()` without recomputing it.
Monty
Fixed duplicate key error when converting HEAP table Aria

The problem was that create_internal_tmp_table() tries to create a
normal key for the blob, which does not work.
The fix is to force a unique key if BLOB keys was used for the
the orignal HEAP table.
Arcadiy Ivanov
Fix stale comments, test bugs, and expand test coverage

**Comment fixes** (source):
- `hp_create.c`: fix "VARTEXT2" → "VARTEXT4", "Paclength" typo, replace
  stale 8-line `bit_start` derivation comment with accurate 3-line version
- `sql_select.cc`: fix `make_sort_key()` → `make_sort_key_part()` in
  `remove_duplicates()` comment; clarify HEAP packed-format comment
- `field.h`: fix "inc record" → "in a record" typo

**Comment fixes** (tests):
- `hp_test_key_setup-t.cc`: replace stale Phase 1 file header and function
  comment describing `key_part->length` widening with accurate blob segment
  normalization description
- `hp_test_hash-t.c`: fix swapped field names in mixed-key record layout
  comment; remove stale "hp_hashnr is static" comment; add missing
  `bit_length=2` for VARTEXT1 segment in `setup_mixed_keydef`
- `blob_big3.test`: fix "without" → "with", "dicrectly" → "directly"
- `blob_big.inc`: fix "in both runs" → "when HEAP is used"

**Test bug fixes**:
- `blob_sj_test`: change `semijoin=off` to `semijoin=on,firstmatch=off,
  loosescan=off` so the test actually exercises DuplicateWeedout SJ strategy;
  add optimizer_switch save/restore
- `blob_fallback`: replace MD5-based integrity checks with direct
  `b = repeat(...)` comparisons
- `blob_stress`: replace 4 useless `check table` (HEAP returns "not
  supported") with echo comments
- `blob_big.inc`: add save/restore for `max_sort_length` and
  `sort_buffer_size`

**Test coverage expansion** (unit tests — `hp_test_hash-t.c`):
- Add packlength 1, 3, 4 hash/comparison tests (12 assertions)
- Add blob+blob multi-segment key test (6 assertions)
- Plan: 49 → 67

**Test coverage expansion** (MTR — `blob.test`):
- INSERT ON DUPLICATE KEY UPDATE with blobs (conflict, no-conflict,
  NULL transitions)
- JSON column CRUD on MEMORY table
- LONGBLOB at uint16 `run_rec_count` split boundary (1,048,549 / 1,048,550
  / 2MB)
- Case A/B/C exact boundary blob sizes (5B, 6B, 10KB, 50KB) with
  cross-case UPDATE
- BTREE+blob rejection (both BTREE and HASH explicit blob keys rejected;
  BTREE on non-blob column with blob data works)
- Table-full error: verify no partial rows from failed inserts (row count,
  corruption check, scan count)

**Test coverage expansion** (MTR — `blob_stress.test`):
- NULL→non-NULL and non-NULL→NULL blob UPDATE operations in the 200-cycle
  stored procedure

**Build**:
- Wire `hp_test_key_setup-t.cc` into CMake build; remove Phase 1-only
  tests (`test_rebuild_key_from_group_buff_mixed`,
  `test_varchar_promoted_to_blob`); update assertions for current blob
  segment normalization; plan: 47 → 34
Arcadiy Ivanov
Fix MSVC `C4267` warnings: `size_t` to narrower type conversions

- `Field_blob::get_key_image_itRAW` and `Field_blob::key_cmp`: cast
  `local_char_length` to `uint32` in `set_if_smaller()` — safe because
  `charpos()` is bounded by `blob_length` which is already `uint32`.
- `hp_test_hash-t.c`: widen `blob_len` parameter from `uint16` to
  `size_t` in `build_record()` and `build_mixed_record()` to match
  `LEX_CUSTRING::length` type.
Arcadiy Ivanov
HEAP GROUP BY / DISTINCT on TEXT/BLOB columns

Enable GROUP BY and DISTINCT operations on TEXT/BLOB columns to use
HEAP temp tables instead of falling back to Aria.

**SQL layer** (`sql_select.cc`, `create_tmp_table.h`, `field.h`):
- Extract `pick_engine()` from `choose_engine()` for early HEAP detection
- `m_heap_expected` flag gates blob-aware paths in GROUP BY key setup
- Fix `calc_group_buffer()` blob subtype bug (TINY/MEDIUM/LONG_BLOB)
- `is_any_blob_field_type()` helper (includes GEOMETRY)
- GROUP BY key setup: `store_length` init, `key_field_length` cap,
  blob `store_length` override, `key_part_flag` deferred assignment
- HEAP-specific: `end_update()` group key restoration after `copy_funcs()`
- HEAP-specific: skip null-bits helper key part for DISTINCT
- `empty_clex_str` for implicit key part field name (prevents SIGSEGV)

**HEAP engine** (`ha_heap.cc`, `ha_heap.h`, `hp_hash.c`, `heap.h`):
- `rebuild_key_from_group_buff()`: parses SQL-layer GROUP BY key buffer
  into `record[0]`, then rebuilds via `hp_make_key()`
- `materialize_heap_key_if_needed()`: dispatches between group-buff
  rebuild and direct `hp_make_key(record[0])` for blob indexes
- `needs_key_rebuild_from_group_buff` flag on `HP_KEYDEF`
- `hp_keydef_has_blob_seg()` inline helper
- `hp_make_key()`: normalize VARCHAR to 2-byte length prefix with
  zero-padding for sanitizer cleanliness
- `hp_vartext_key_pack_size()` helper for key advancement
- Endian-safe blob length write via `store_lowendian()`
- Varchar bounds clamp in `rebuild_key_from_group_buff()`
- Fix geometry GROUP BY key widening: skip widening when
  `key_part->length <= pack_length_no_ptr()` to prevent `store_length`
  overflow with `Field_geom::key_length()` = 4 (MSAN fix)
- Pre-compute `has_blob_seg` in `heap_prepare_hp_create_info()` so
  callers can use it before `heap_create()` runs (MSAN fix)

**Tests**:
- `heap.heap_blob_ops`: COUNT(DISTINCT), IN-subquery, GROUP BY ROLLUP,
  window functions, CTE materialization, PAD SPACE scenarios
- `hp_test_hash-t.c`: 43->56 TAP tests (hash consistency, mixed keys)
- `hp_test_key_setup-t.cc`: 9->63 TAP tests with `Fake_thd_guard` RAII,
  geometry GROUP BY no-widening test
- Result updates: `count_distinct`, `status`, `tmp_table_error`
Arcadiy Ivanov
Cap `min_run_records` for small blob free-list reuse

The free-list allocator's minimum contiguous run threshold
(`min_run_records`) could exceed the total records a small blob
actually needs, making free-list reuse impossible on narrow tables.

For example, with `recbuffer=16` the 128-byte floor produced
`min_run_records=8`, but a 32-byte blob only needs 3 records.
Any contiguous free-list group of 3 would be rejected, forcing
unnecessary tail allocation.

Cap both `min_run_bytes` at `data_len` and `min_run_records` at
`total_records_needed` so small blobs can reuse free-list slots
when a sufficient contiguous group exists.