Contents
Biscuit Index Extension - Changelog
Version 2.2.3
Structural Changes
Monolith split into modules. The single
biscuit.cfile has been decomposed into focused translation units, each with its own header: | Module | Responsibility | |—|—| |biscuit.c| AM handler, SQL-callable functions,_PG_init| |biscuit_bitmap.{c,h}| Roaring bitmap abstraction + fallback bitset | |biscuit_cache.{c,h}| Session-scoped index cache | |biscuit_index.{c,h}| Index build, load, disk I/O, CRUD helpers | |biscuit_pattern.{c,h}| LIKE/ILIKE pattern parsing and bitmap matching | |biscuit_preload.{c,h}| Background preload worker and skeleton loader | |biscuit_scan.{c,h}| Scan lifecycle (beginscan/rescan/gettuple/getbitmap/endscan) | |biscuit_tid.{c,h}| TID sorting (radix + qsort) and parallel collection | |biscuit_utf8.{c,h}| UTF-8 character utilities and Datum→text helpers | All shared types, constants, and macros have been consolidated intobiscuit_common.h. No SQL-level API changes.Version bumped to
2.2.3(BISCUIT_LIBRARY_VERSION).
New Features
- PostgreSQL 19 Beta 1 support.
PG_MODULE_MAGIC_EXT(introduced in PG 19) is now used when available, with a fallback toPG_MODULE_MAGICfor older versions. The extension can now be built and loaded against PG 19 development builds without modification.
Improvements
Memory context correctness. The session cache (
biscuit_cache.c) now explicitly switches toCacheMemoryContextbefore allocating cache list nodes, ensuring index structures survive transaction boundaries without relying on caller context. Thebiscuit_cleanup_indexstub correctly avoids double-freeing memory owned by the context.biscuit_complete_preload_local()added as a fast in-process upgrade path: rebuilds bitmaps from the already-resident string cache without reopening the relation or re-scanning the heap. Used bybeginscanwhen it detects the worker has finished between queries.TID collection refactored into
biscuit_tid.c. The unified entry pointbiscuit_collect_tids_optimized()selects parallel vs. single-threaded collection automatically and supports an optionallimit_hintto avoid collecting more TIDs than the executor needs.Fallback scan in
biscuit_preload.csupports NOT LIKE and NOT ILIKE during warm-up via a hash-map TID→record-index lookup, maintaining correct inversion semantics without bitmaps.UTF-8 helpers isolated in
biscuit_utf8.{c,h}, removing scattered inline character-length and lowercase conversion code from the pattern and index modules.biscuit_columnindex_memory_usage()now validatesmax_length >= 0before iterating length bitmap arrays and emits aWARNINGon corrupt state rather than reading out-of-bounds.
Bug Fixes
biscuit_cache_remove()no longer callspfreeon list nodes; they are owned byCacheMemoryContextand must not be freed manually.
Version 2.2.2
Performance Improvements
Refined TID sorting implementation
Replaced the previous hybrid dense/sparse block radix sorter with a uniform 4-pass radix sort covering the full 32-bit BlockNumber.
Sorting is now performed using four 8-bit passes, eliminating assumptions about block number density or range.
Correctness & Stability
Aligned TID comparison with PostgreSQL core
Replaced custom TID comparison logic with PostgreSQL’s native comparison routine to ensure consistent ordering behavior.
Version 2.2.1
Bug Fixes
Fixed recursive pattern matching
Resolved incorrect behavior when evaluating nested or repeated wildcard patterns during recursive matching.
Corrected underscore (
_) handling in single-column indexing_now correctly operates on character-based offsets (not byte offsets), in accordance with SQLLIKE/ILIKEsemantics, eliminating false matches in multi-byte UTF-8 text.
Correctness & Stability
- Improved internal consistency between single-column and multi-column pattern evaluation paths.
- Resolved observed edge cases that could lead to incorrect matches under complex wildcard patterns.
Version 2.2.0
Major Changes
Switched from byte-based to character-based indexing
- Biscuit now indexes Unicode characters instead of raw UTF-8 bytes.
- Eliminates incorrect behavior caused by multi-byte UTF-8 sequences being treated as independent index entries.
- Index structure now aligns with PostgreSQL’s character semantics rather than byte-level representation.
UTF-8 & Internationalization Improvements
Enhanced UTF-8 compatibility
- Improved handling of multi-byte UTF-8 characters (e.g., accented Latin characters, non-Latin scripts).
- Index lookups, comparisons, and filtering now operate on logical characters rather than byte fragments.
Correct UTF-8 support for ILIKE
ILIKEnow works reliably with UTF-8 text, including case-insensitive matching on multi-byte characters.- Fixes previously incorrect matches and missed results in non-ASCII datasets.
CRUD Correctness Fixes
Resolved multiple CRUD-related bugs
- Fixed inconsistencies during INSERT, UPDATE, and DELETE operations that could leave the index in an incorrect state.
- Ensured index entries are properly added, updated, and removed in sync with heap tuples.
- Improved stability under mixed read/write workloads.
Correctness & Planner Consistency
- Improved alignment between Biscuit’s index behavior and PostgreSQL’s text semantics.
- Reduced false positives during pattern matching and eliminated character-splitting artifacts.
- More predictable planner behavior due to improved index consistency.
Internal Refactoring
- Refactored index layout and lookup logic to support character-aware traversal.
- Hardened UTF-8 decoding paths and edge-case handling.
- Simplified internal invariants for better maintainability and debugging.
Version 2.1.5
Improvements
Removed arbitrary limits on multi-column indexes
- Biscuit no longer enforces hard-coded limits when creating indexes over multiple columns, allowing more flexible index definitions.
Safety & Correctness
Restricted indexing to text-based datatypes
- Support for non-text datatypes has been removed. Biscuit now explicitly enforces text-only columns to ensure correct operator semantics, planner behavior, and index consistency.
Explicit error for expression indexing
- Biscuit now raises a clear error when users attempt to create an index on an expression (e.g.,
lower(col)), which is not currently supported. This prevents silent misconfiguration and enforces Biscuit’s column-based indexing semantics.
Note: Biscuit currently indexes base columns only. This may be revisited in future versions.
Version 2.1.4
Build & Packaging
- Improved Makefile detection logic for CRoaring bitmap support by checking multiple common installation paths, increasing portability across systems and build environments.
New Features
Build and configuration introspection
Added SQL functions to inspect Biscuit build-time configuration, useful for debugging, reproducibility, and deployment verification.
biscuit_version() → text
Returns the Biscuit extension version string.
biscuit_build_info() → table
Returns detailed build-time configuration information.
biscuit_build_info_json() → text
Returns build configuration as a JSON string for automation and scripting.
Roaring Bitmap support introspection
Added built-in SQL functions to inspect CRoaring bitmap support in Biscuit.
biscuit_has_roaring() → boolean
Checks whether the extension was compiled with CRoaring bitmap support.
biscuit_roaring_version() → text
Returns the CRoaring library version if available.
Diagnostic views
Added a built-in diagnostic view for quick inspection of Biscuit status and configuration.
biscuit_status
A single-row view providing an overview of:- extension version
- CRoaring enablement
- bitmap backend in use
- total number of Biscuit indexes
- combined on-disk index size
Version 2.1.3
New Features
Added Index Memory Introspection Utilities
Added built-in SQL functions and a view to inspect Biscuit index in-memory footprint.
biscuit_index_memory_size(index_oid oid) → bigintLow-level C-backed function returning the exact memory usage (in bytes) of a Biscuit index currently resident in memory.
biscuit_index_memory_size(index_name text) → bigintConvenience SQL wrapper accepting an index name instead of an OID.
biscuit_size_pretty(index_name text) → textHuman-readable formatter that reports Biscuit index memory usage in bytes, KB, MB, or GB while preserving the exact byte count.
biscuit_memory_usageviewA consolidated view exposing:
- schema name
- table name
- index name
- Biscuit in-memory size
- human-readable memory size
- on-disk index size (via
pg_relation_size)
This allows direct comparison between in-memory Biscuit structures and their persistent disk representation.
SELECT * FROM biscuit_memory_usage;
Notes
- Memory accounting reflects Biscuit’s deliberate cache persistence design, intended to optimize repeated pattern-matching workloads.
- Functions are marked
VOLATILEto ensure accurate reporting of live memory state. pg_size_pretty(pg_relation_size(...))reports only the on-disk footprint of the Biscuit index. Since Biscuit maintains its primary structures in memory (cache buffers / AM cache), the reported disk size may significantly underrepresent the index’s effective total footprint during execution. Hence, we recommend the usage ofbiscuit_size_pretty(...)to view the actual size of the index.
Performance improvements
Removed redundant bitmaps
Separate bitmaps for length-based filtering for case-insensitive search were removed. Case insensitive searches now use the same length-based filtering bitmaps as case-sensitive ones.
Version 2.1.2 (2025-12-11)
New Features
ILIKE Operator Support (Case-Insensitive Matching)
Biscuit now provides full support for the ILIKE operator, enabling efficient case-insensitive wildcard searches directly through the index.
Capabilities:
- Optimized execution path for
ILIKEandNOT ILIKE - Works seamlessly in mixed predicate chains alongside
LIKE/NOT LIKE - Fully compatible with multi-column Biscuit indexes
Examples:
-- Case-insensitive suffix search
SELECT * FROM users WHERE name ILIKE '%son';
-- Combination queries
SELECT * FROM users
WHERE name ILIKE 'a%' AND email NOT ILIKE '%test%';
#
Removed Length Constraint for Indexing
The previous hardcoded 256-character indexing limit has been removed. Biscuit now indexes values of any length, including very long strings.
Impact:
- All text values—short or arbitrarily long—are now included in bitmap generation
- More consistent query coverage for fields like descriptions, logs, and message bodies
Version 2.1.0 - 2.1.1
Contain build issues. Fixed in version - 2.1.2.
Version 2.0.1 (2024-12-06)
Bug Fixes
Fixed Incorrect Results with Multiple Filter Predicates
Issue: Queries with multiple LIKE or NOT LIKE predicates on the same column could return incorrect results.
Root Cause: When executing queries with multiple filter predicates (e.g., name LIKE '%a%' AND name NOT LIKE '%3%'), the bitmap inversion logic for NOT LIKE was being applied globally instead of per-predicate, causing the wrong result set to be returned.
Example of Affected Query: ```sql – Query with multiple filters SELECT COUNT(*) FROM users WHERE name LIKE ‘%a%’ AND name NOT LIKE ‘%3%’;
– v2.0.0: Returned incorrect count (e.g., 252,167) – v2.0.1: Returns correct count (e.g., 251,482) ✅ – Verified against sequential scan ```
Fix: Implemented per-predicate bitmap inversion logic that correctly handles each filter independently before combining results.
Impact:
- Affected Queries: Any query with 2+ predicates using LIKE and/or NOT LIKE on indexed columns
- Severity: HIGH - Results were incorrect but deterministic
- Data Safety: No data corruption - index structure unchanged
Verification: ```sql – All these patterns now return correct results:
– Pattern 1: LIKE + NOT LIKE WHERE name LIKE ‘%abc%’ AND name NOT LIKE ‘%xyz%’
– Pattern 2: Multiple NOT LIKE WHERE name NOT LIKE ‘%a%’ AND name NOT LIKE ‘%b%’
– Pattern 3: Complex combinations WHERE col1 LIKE ‘A%’ AND col2 NOT LIKE ‘%test%’ AND col1 LIKE ‘%end’ ```
NOT LIKE Operator Support
- Full support for
NOT LIKEpattern matching (Strategy #2) - Efficient bitmap negation for exclusion queries
- Example:
WHERE name NOT LIKE '%test%'
Upgrade Notes
Compatibility: - Fully backward compatible with v2.0.0
Recommended Actions:
1. Update extension: ALTER EXTENSION biscuit UPDATE TO '2.0.1';
2. Re-run any critical queries that used multiple predicates to verify corrected results
Version 2.0.0 (2024-11-05)
Major Features
Multi-Column Index Support
- Create Biscuit indices on multiple columns simultaneously
- Per-column bitmap optimization for efficient filtering
- Example:
CREATE INDEX idx ON table USING biscuit(name, email, description);
Query Optimization Engine
- Intelligent predicate reordering based on selectivity analysis
- Executes most selective filters first to minimize candidate set
- Supports exact, prefix, suffix, and substring pattern detection
Performance Enhancements
- TID sorting for sequential heap access (5000+ results)
- Parallel bitmap collection for large result sets (10K+ matches)
- Direct Roaring bitmap iteration without intermediate arrays
- Skip sorting for bitmap scans (COUNT/aggregate queries)
- LIMIT-aware early termination
Memory Management Improvements
- Persistent caching in CacheMemoryContext
- Automatic cache invalidation on index drop/ALTER
- Batch cleanup with configurable threshold (1000 tombstones)
🔧 Technical Improvements
Pattern Matching:
- Fast-path optimizations for pure wildcard patterns (%, _)
- Exact length matching for underscore-only patterns
- Optimized single-part and two-part pattern execution
- Recursive windowed matching for complex multi-part patterns
Type Support: - Text, VARCHAR, CHAR (native) - Integer types (INT2, INT4, INT8) with sortable encoding - Float types (FLOAT4, FLOAT8) with scientific notation - Date/Timestamp types with microsecond precision - Boolean type
Index Statistics:
- biscuit_index_stats(index_oid) function for diagnostics
- CRUD operation tracking (inserts, updates, deletes)
- Tombstone and free slot monitoring
Full Documentation: See README.md or visit ReadTheDocs for complete usage guide and examples.