Biscuit Index Extension - Changelog

Version 2.2.3

Structural Changes

Monolith split into modules. The single biscuit.c file has been decomposed into focused translation units, each with its own header: | Module | Responsibility | |—|—| | biscuit.c | AM handler, SQL-callable functions, _PG_init | | biscuit_bitmap.{c,h} | Roaring bitmap abstraction + fallback bitset | | biscuit_cache.{c,h} | Session-scoped index cache | | biscuit_index.{c,h} | Index build, load, disk I/O, CRUD helpers | | biscuit_pattern.{c,h} | LIKE/ILIKE pattern parsing and bitmap matching | | biscuit_preload.{c,h} | Background preload worker and skeleton loader | | biscuit_scan.{c,h} | Scan lifecycle (beginscan/rescan/gettuple/getbitmap/endscan) | | biscuit_tid.{c,h} | TID sorting (radix + qsort) and parallel collection | | biscuit_utf8.{c,h} | UTF-8 character utilities and Datum→text helpers | All shared types, constants, and macros have been consolidated into biscuit_common.h. No SQL-level API changes.
Version bumped to 2.2.3 (BISCUIT_LIBRARY_VERSION).

New Features

PostgreSQL 19 Beta 1 support. PG_MODULE_MAGIC_EXT (introduced in PG 19) is now used when available, with a fallback to PG_MODULE_MAGIC for older versions. The extension can now be built and loaded against PG 19 development builds without modification.

Improvements

Memory context correctness. The session cache (biscuit_cache.c) now explicitly switches to CacheMemoryContext before allocating cache list nodes, ensuring index structures survive transaction boundaries without relying on caller context. The biscuit_cleanup_index stub correctly avoids double-freeing memory owned by the context.
biscuit_complete_preload_local() added as a fast in-process upgrade path: rebuilds bitmaps from the already-resident string cache without reopening the relation or re-scanning the heap. Used by beginscan when it detects the worker has finished between queries.
TID collection refactored into biscuit_tid.c. The unified entry point biscuit_collect_tids_optimized() selects parallel vs. single-threaded collection automatically and supports an optional limit_hint to avoid collecting more TIDs than the executor needs.
Fallback scan in biscuit_preload.c supports NOT LIKE and NOT ILIKE during warm-up via a hash-map TID→record-index lookup, maintaining correct inversion semantics without bitmaps.
UTF-8 helpers isolated in biscuit_utf8.{c,h}, removing scattered inline character-length and lowercase conversion code from the pattern and index modules.
biscuit_columnindex_memory_usage() now validates max_length >= 0 before iterating length bitmap arrays and emits a WARNING on corrupt state rather than reading out-of-bounds.

Bug Fixes

biscuit_cache_remove() no longer calls pfree on list nodes; they are owned by CacheMemoryContext and must not be freed manually.

Version 2.2.2

Performance Improvements

Refined TID sorting implementation

Replaced the previous hybrid dense/sparse block radix sorter with a uniform 4-pass radix sort covering the full 32-bit BlockNumber.

Sorting is now performed using four 8-bit passes, eliminating assumptions about block number density or range.

Correctness & Stability

Aligned TID comparison with PostgreSQL core

Replaced custom TID comparison logic with PostgreSQL’s native comparison routine to ensure consistent ordering behavior.

Version 2.2.1

Bug Fixes

Fixed recursive pattern matching

Resolved incorrect behavior when evaluating nested or repeated wildcard patterns during recursive matching.
Corrected underscore (_) handling in single-column indexing

_ now correctly operates on character-based offsets (not byte offsets), in accordance with SQL LIKE / ILIKE semantics, eliminating false matches in multi-byte UTF-8 text.

Correctness & Stability

Improved internal consistency between single-column and multi-column pattern evaluation paths.
Resolved observed edge cases that could lead to incorrect matches under complex wildcard patterns.

Version 2.2.0

Major Changes

Switched from byte-based to character-based indexing

Biscuit now indexes Unicode characters instead of raw UTF-8 bytes.
Eliminates incorrect behavior caused by multi-byte UTF-8 sequences being treated as independent index entries.
Index structure now aligns with PostgreSQL’s character semantics rather than byte-level representation.

UTF-8 & Internationalization Improvements

Enhanced UTF-8 compatibility

Improved handling of multi-byte UTF-8 characters (e.g., accented Latin characters, non-Latin scripts).
Index lookups, comparisons, and filtering now operate on logical characters rather than byte fragments.

Correct UTF-8 support for ILIKE

ILIKE now works reliably with UTF-8 text, including case-insensitive matching on multi-byte characters.
Fixes previously incorrect matches and missed results in non-ASCII datasets.

CRUD Correctness Fixes

Resolved multiple CRUD-related bugs

Fixed inconsistencies during INSERT, UPDATE, and DELETE operations that could leave the index in an incorrect state.
Ensured index entries are properly added, updated, and removed in sync with heap tuples.
Improved stability under mixed read/write workloads.

Correctness & Planner Consistency

Improved alignment between Biscuit’s index behavior and PostgreSQL’s text semantics.
Reduced false positives during pattern matching and eliminated character-splitting artifacts.
More predictable planner behavior due to improved index consistency.

Internal Refactoring

Refactored index layout and lookup logic to support character-aware traversal.
Hardened UTF-8 decoding paths and edge-case handling.
Simplified internal invariants for better maintainability and debugging.

Version 2.1.5

Improvements

Removed arbitrary limits on multi-column indexes

Biscuit no longer enforces hard-coded limits when creating indexes over multiple columns, allowing more flexible index definitions.

Safety & Correctness

Restricted indexing to text-based datatypes

Support for non-text datatypes has been removed. Biscuit now explicitly enforces text-only columns to ensure correct operator semantics, planner behavior, and index consistency.

Explicit error for expression indexing

Biscuit now raises a clear error when users attempt to create an index on an expression (e.g., lower(col)), which is not currently supported. This prevents silent misconfiguration and enforces Biscuit’s column-based indexing semantics.

Note: Biscuit currently indexes base columns only. This may be revisited in future versions.

Version 2.1.4

Build & Packaging

Improved Makefile detection logic for CRoaring bitmap support by checking multiple common installation paths, increasing portability across systems and build environments.

New Features

Build and configuration introspection

Added SQL functions to inspect Biscuit build-time configuration, useful for debugging, reproducibility, and deployment verification.

biscuit_version() → text

Returns the Biscuit extension version string.

biscuit_build_info() → table

Returns detailed build-time configuration information.

biscuit_build_info_json() → text

Returns build configuration as a JSON string for automation and scripting.

Roaring Bitmap support introspection

Added built-in SQL functions to inspect CRoaring bitmap support in Biscuit.

biscuit_has_roaring() → boolean

Checks whether the extension was compiled with CRoaring bitmap support.

biscuit_roaring_version() → text

Returns the CRoaring library version if available.

Diagnostic views

Added a built-in diagnostic view for quick inspection of Biscuit status and configuration.

biscuit_status
A single-row view providing an overview of:
- extension version
- CRoaring enablement
- bitmap backend in use
- total number of Biscuit indexes
- combined on-disk index size

Version 2.1.3

New Features

Added Index Memory Introspection Utilities

Added built-in SQL functions and a view to inspect Biscuit index in-memory footprint.

biscuit_index_memory_size(index_oid oid) → bigint

Low-level C-backed function returning the exact memory usage (in bytes) of a Biscuit index currently resident in memory.
biscuit_index_memory_size(index_name text) → bigint

Convenience SQL wrapper accepting an index name instead of an OID.
biscuit_size_pretty(index_name text) → text

Human-readable formatter that reports Biscuit index memory usage in bytes, KB, MB, or GB while preserving the exact byte count.
biscuit_memory_usage view

A consolidated view exposing:
- schema name
- table name
- index name
- Biscuit in-memory size
- human-readable memory size
- on-disk index size (via pg_relation_size)
This allows direct comparison between in-memory Biscuit structures and their persistent disk representation.

SELECT * FROM biscuit_memory_usage;

Notes

Memory accounting reflects Biscuit’s deliberate cache persistence design, intended to optimize repeated pattern-matching workloads.
Functions are marked VOLATILE to ensure accurate reporting of live memory state.
pg_size_pretty(pg_relation_size(...)) reports only the on-disk footprint of the Biscuit index. Since Biscuit maintains its primary structures in memory (cache buffers / AM cache), the reported disk size may significantly underrepresent the index’s effective total footprint during execution. Hence, we recommend the usage of biscuit_size_pretty(...) to view the actual size of the index.

Performance improvements

Removed redundant bitmaps

Separate bitmaps for length-based filtering for case-insensitive search were removed. Case insensitive searches now use the same length-based filtering bitmaps as case-sensitive ones.

Version 2.1.2 (2025-12-11)

New Features

ILIKE Operator Support (Case-Insensitive Matching)

Biscuit now provides full support for the ILIKE operator, enabling efficient case-insensitive wildcard searches directly through the index.

Capabilities:

Optimized execution path for ILIKE and NOT ILIKE
Works seamlessly in mixed predicate chains alongside LIKE / NOT LIKE
Fully compatible with multi-column Biscuit indexes

Examples:

-- Case-insensitive suffix search
SELECT * FROM users WHERE name ILIKE '%son';

-- Combination queries
SELECT * FROM users
WHERE name ILIKE 'a%' AND email NOT ILIKE '%test%';

#

Removed Length Constraint for Indexing

The previous hardcoded 256-character indexing limit has been removed. Biscuit now indexes values of any length, including very long strings.

Impact:

All text values—short or arbitrarily long—are now included in bitmap generation
More consistent query coverage for fields like descriptions, logs, and message bodies

Version 2.1.0 - 2.1.1

Contain build issues. Fixed in version - 2.1.2.

Version 2.0.1 (2024-12-06)

Bug Fixes

Fixed Incorrect Results with Multiple Filter Predicates

Issue: Queries with multiple LIKE or NOT LIKE predicates on the same column could return incorrect results.

Root Cause: When executing queries with multiple filter predicates (e.g., name LIKE '%a%' AND name NOT LIKE '%3%'), the bitmap inversion logic for NOT LIKE was being applied globally instead of per-predicate, causing the wrong result set to be returned.

Example of Affected Query: ```sql – Query with multiple filters SELECT COUNT(*) FROM users WHERE name LIKE ‘%a%’ AND name NOT LIKE ‘%3%’;

– v2.0.0: Returned incorrect count (e.g., 252,167) – v2.0.1: Returns correct count (e.g., 251,482) ✅ – Verified against sequential scan ```

Fix: Implemented per-predicate bitmap inversion logic that correctly handles each filter independently before combining results.

Impact: - Affected Queries: Any query with 2+ predicates using LIKE and/or NOT LIKE on indexed columns - Severity: HIGH - Results were incorrect but deterministic - Data Safety: No data corruption - index structure unchanged

Verification: ```sql – All these patterns now return correct results:

– Pattern 1: LIKE + NOT LIKE WHERE name LIKE ‘%abc%’ AND name NOT LIKE ‘%xyz%’

– Pattern 2: Multiple NOT LIKE WHERE name NOT LIKE ‘%a%’ AND name NOT LIKE ‘%b%’

– Pattern 3: Complex combinations WHERE col1 LIKE ‘A%’ AND col2 NOT LIKE ‘%test%’ AND col1 LIKE ‘%end’ ```

NOT LIKE Operator Support

Full support for NOT LIKE pattern matching (Strategy #2)
Efficient bitmap negation for exclusion queries
Example: WHERE name NOT LIKE '%test%'

Upgrade Notes

Compatibility: - Fully backward compatible with v2.0.0

Recommended Actions: 1. Update extension: ALTER EXTENSION biscuit UPDATE TO '2.0.1'; 2. Re-run any critical queries that used multiple predicates to verify corrected results

Version 2.0.0 (2024-11-05)

Major Features

Multi-Column Index Support

Create Biscuit indices on multiple columns simultaneously
Per-column bitmap optimization for efficient filtering
Example: CREATE INDEX idx ON table USING biscuit(name, email, description);

Query Optimization Engine

Intelligent predicate reordering based on selectivity analysis
Executes most selective filters first to minimize candidate set
Supports exact, prefix, suffix, and substring pattern detection

Performance Enhancements

TID sorting for sequential heap access (5000+ results)
Parallel bitmap collection for large result sets (10K+ matches)
Direct Roaring bitmap iteration without intermediate arrays
Skip sorting for bitmap scans (COUNT/aggregate queries)
LIMIT-aware early termination

Memory Management Improvements

Persistent caching in CacheMemoryContext
Automatic cache invalidation on index drop/ALTER
Batch cleanup with configurable threshold (1000 tombstones)

🔧 Technical Improvements

Pattern Matching: - Fast-path optimizations for pure wildcard patterns (%, _) - Exact length matching for underscore-only patterns - Optimized single-part and two-part pattern execution - Recursive windowed matching for complex multi-part patterns

Type Support: - Text, VARCHAR, CHAR (native) - Integer types (INT2, INT4, INT8) with sortable encoding - Float types (FLOAT4, FLOAT8) with scientific notation - Date/Timestamp types with microsecond precision - Boolean type

Index Statistics: - biscuit_index_stats(index_oid) function for diagnostics - CRUD operation tracking (inserts, updates, deletes) - Tombstone and free slot monitoring

Full Documentation: See README.md or visit ReadTheDocs for complete usage guide and examples.

PGXN

PostgreSQL Extension Network

Contents

Biscuit Index Extension - Changelog

Version 2.2.3

Structural Changes

New Features

Improvements

Bug Fixes

Version 2.2.2

Performance Improvements

Correctness & Stability

Version 2.2.1

Bug Fixes

Correctness & Stability

Version 2.2.0

Major Changes

UTF-8 & Internationalization Improvements

CRUD Correctness Fixes

Correctness & Planner Consistency

Internal Refactoring

Version 2.1.5

Improvements

Safety & Correctness

Version 2.1.4

Build & Packaging

New Features

Build and configuration introspection

Roaring Bitmap support introspection

Diagnostic views

Version 2.1.3

New Features

Added Index Memory Introspection Utilities

Notes

Performance improvements

Removed redundant bitmaps

Version 2.1.2 (2025-12-11)

New Features

ILIKE Operator Support (Case-Insensitive Matching)

#

Removed Length Constraint for Indexing

Version 2.1.0 - 2.1.1

Version 2.0.1 (2024-12-06)

Bug Fixes

Fixed Incorrect Results with Multiple Filter Predicates

NOT LIKE Operator Support

Upgrade Notes

Version 2.0.0 (2024-11-05)

Major Features

Multi-Column Index Support

Query Optimization Engine

Performance Enhancements

Memory Management Improvements

🔧 Technical Improvements