Extensions
README
Contents
- BISCUIT: Bitmap-Indexed String Comparison Using Intelligent Traversal
BISCUIT: Bitmap-Indexed String Comparison Using Intelligent Traversal
A PostgreSQL extension for accelerating wildcard pattern matching queries using bitmap indexing.
Updates in 1.0.6
- Introduced a new bitmap that stores records having length greater than ‘n’, to reduce computational cost.
Technical Overview
Problem Statement
Traditional database indexes (B-tree, GiST, GIN) exhibit suboptimal performance for wildcard pattern matching queries, particularly when wildcards appear at the beginning of patterns. A query such as SELECT * FROM table WHERE column LIKE '%pattern%' typically requires a sequential scan.
Solution Architecture
BISCUIT implements a multi-dimensional bitmap index structure that decomposes strings into positional character mappings. Each character at each position is tracked using compressed bitmaps (Roaring Bitmaps), enabling efficient intersection operations for pattern matching.
Core Principle
Instead of scanning strings sequentially, BISCUIT:
- Pre-indexes all character positions (up to 256 characters)
- Uses bitmap intersections to find candidate records
- Filters results through tombstone bitmap (for deleted records)
- Returns matching records
Reality Check: This trades initial indexing time and memory for faster queries. It’s not magic—you’re paying upfront to make searches faster later.
Core Index Structures
-
Positional Character Index: Maps each character to position-bitmap pairs
- Forward positions:
{0, 1, 2, ..., 255}from string start - Reverse positions:
{-1, -2, -3, ..., -256}from string end
- Forward positions:
-
Character Existence Cache: Union of all positional bitmaps per character for containment queries
-
Length Index: Bitmap array indexed by string length
-
Primary Key Hash Table: O(1) lookup structure mapping primary keys to internal indices
-
Lazy Deletion Layer: Tombstone bitmap tracking deleted records with deferred cleanup
Key Features
- Bitmap-based pattern matching: Optimized for wildcard searches
- Hash table primary key lookup: Fast record access by primary key
- Lazy deletion: Deferred cleanup of deleted records using tombstones
- Incremental updates: Optimized path for similar string updates
- Automatic maintenance: Optional triggers to keep index synchronized
- Roaring Bitmap support: Uses CRoaring library when available
Limitations
- In-memory only: Entire index must fit in RAM
- Single column: Indexes one column at a time
- Session-scoped: Index is rebuilt per session (not persistent)
- String data only: Designed for text/varchar columns
- Memory overhead: Bitmap indices can be memory-intensive for large datasets
Installation and Requirements
System Requirements
- PostgreSQL Version: 11.0 or higher
- Operating System: Linux, macOS, Windows (with MinGW)
- Architecture: x86_64, ARM64
- Memory: Minimum 512MB available RAM per index
- Compiler: GCC 4.8+, Clang 3.9+, or MSVC 2017+
Optional Dependencies
- CRoaring Library: For optimized Roaring Bitmap operations (recommended)
- Installation:
apt-get install libroaring-dev(Debian/Ubuntu) - Falls back to built-in implementation if unavailable
- Installation:
Installation
Prerequisites
- PostgreSQL 16 or higher
- C compiler (gcc/clang)
- (Optional) CRoaring library for optimized bitmap operations
Build and Install
# With Roaring Bitmap support (recommended)
make HAVE_ROARING=1
sudo make install HAVE_ROARING=1
# Without Roaring (uses fallback implementation)
make
sudo make install
Enable Extension
CREATE EXTENSION biscuit;
Quick Start
-- One-command setup: builds index, creates query function, enables triggers
SELECT biscuit_setup('users', 'email', 'id');
-- Query with patterns
SELECT * FROM biscuit_match('%@gmail.com');
SELECT * FROM biscuit_match('admin%');
SELECT * FROM biscuit_match('%test%');
-- Get count only (faster than full query)
SELECT biscuit_match_count('%@gmail.com');
-- Check index status
SELECT biscuit_index_status();
Function Reference
Setup Functions
biscuit_setup(table_name, column_name, pk_column_name)
Complete initialization in one step. Builds the bitmap index, creates a type-safe biscuit_match() function, and enables automatic triggers.
SELECT biscuit_setup('users', 'email', 'id');
Parameters:
table_name(text): Name of the table to indexcolumn_name(text): Name of the column to index (must be text/varchar)pk_column_name(text): Name of the primary key column (default: ‘id’)
Returns: Success message with usage instructions
biscuit_build_index(table_name, column_name, pk_column_name)
Low-level function to build the bitmap index only, without creating query functions or triggers.
SELECT biscuit_build_index('users', 'email', 'id');
Returns: Boolean (true on success)
Query Functions
biscuit_match(pattern)
Returns complete table rows matching the pattern. This function is created by biscuit_setup() and is strongly-typed to match your table schema.
-- Get all matching rows
SELECT * FROM biscuit_match('%@gmail.com');
-- Filter and sort results
SELECT id, email FROM biscuit_match('%test%')
WHERE created_at > '2024-01-01'
ORDER BY id LIMIT 10;
Parameters:
pattern(text): Wildcard pattern using%(zero or more chars) and_(exactly one char)
Returns: TABLE matching the indexed table’s schema
biscuit_match_count(pattern)
Returns only the count of matching records. This is more efficient than counting rows from biscuit_match().
SELECT biscuit_match_count('%@gmail.com');
Parameters:
pattern(text): Wildcard pattern
Returns: INTEGER count of matches
biscuit_match_keys(pattern)
Returns primary key and indexed value pairs as text. Useful for joining or when you only need these fields.
-- Get keys and values
SELECT * FROM biscuit_match_keys('%@gmail.com');
-- Cast primary key back to original type
SELECT pk::integer AS id, value FROM biscuit_match_keys('%pattern%');
Parameters:
pattern(text): Wildcard pattern
Returns: TABLE(pk text, value text)
Monitoring Functions
biscuit_index_status()
Returns detailed status information about the current index including table/column info, record counts, memory usage, CRUD statistics, and lazy deletion status.
SELECT biscuit_index_status();
Returns: TEXT formatted status report
biscuit_get_active_count()
Returns the number of live (non-deleted) records in the index.
SELECT biscuit_get_active_count();
Returns: INTEGER
biscuit_get_free_slots()
Returns the number of reusable slots from deleted records.
SELECT biscuit_get_free_slots();
Returns: INTEGER
biscuit_get_tombstone_count()
Returns the number of soft-deleted records awaiting cleanup.
SELECT biscuit_get_tombstone_count();
Returns: INTEGER
Maintenance Functions
biscuit_cleanup()
Manually triggers tombstone cleanup. Normally cleanup happens automatically when the tombstone threshold is reached, but you can force it with this function.
SELECT biscuit_cleanup();
Returns: TEXT cleanup report
biscuit_enable_triggers()
Activates automatic index maintenance triggers on the indexed table.
SELECT biscuit_enable_triggers();
Returns: TEXT confirmation message
biscuit_disable_triggers()
Disables automatic index updates. Useful for bulk operations to avoid per-row overhead. Remember to rebuild the index afterward.
SELECT biscuit_disable_triggers();
-- Perform bulk operations
INSERT INTO users ...;
UPDATE users ...;
-- Rebuild index
SELECT biscuit_build_index('users', 'email', 'id');
SELECT biscuit_enable_triggers();
Returns: TEXT confirmation message
biscuit_version()
Returns the version of the BISCUIT extension.
SELECT biscuit_version();
Returns: TEXT version string
Pattern Syntax
BISCUIT supports SQL LIKE pattern syntax:
%- Matches zero or more characters_- Matches exactly one character
Pattern Examples
-- Prefix match
SELECT * FROM biscuit_match('john%'); -- Starts with 'john'
-- Suffix match
SELECT * FROM biscuit_match('%@gmail.com'); -- Ends with '@gmail.com'
-- Contains match
SELECT * FROM biscuit_match('%test%'); -- Contains 'test'
-- Exact length
SELECT * FROM biscuit_match('____'); -- Exactly 4 characters
-- Single wildcard
SELECT * FROM biscuit_match('a_c'); -- 'a', any char, 'c'
-- Complex patterns
SELECT * FROM biscuit_match('admin%2024'); -- Starts 'admin', ends '2024'
SELECT * FROM biscuit_match('%user%@%'); -- Contains 'user', then '@'
CRUD Operations
Once triggers are enabled, the index automatically maintains itself:
-- INSERT: New records are automatically indexed
INSERT INTO users (id, email) VALUES (999, 'new@example.com');
-- UPDATE: Index is updated (uses incremental update when possible)
UPDATE users SET email = 'updated@example.com' WHERE id = 1;
-- DELETE: Records are soft-deleted with tombstones (O(1) operation)
DELETE FROM users WHERE id = 2;
-- Queries immediately reflect all changes
SELECT * FROM biscuit_match('%example.com');
Incremental Updates
The extension uses an incremental update optimization when:
- Old and new values have the same length
- String is at least 3 characters
- Fewer than 20% of characters changed
- Maximum 3 characters changed
This provides faster updates for similar strings (e.g., typo corrections, case changes).
Lazy Deletion
Deletions use tombstone marking for O(1) performance:
- Record is marked as tombstoned (bitmap operation)
- Primary key is removed from hash table
- Slot is added to free list for reuse
- Actual cleanup happens in batches when threshold is reached
Performance Considerations
When to Use BISCUIT
Good use cases:
- Frequent wildcard searches on a string column
- Dataset size fits comfortably in available memory
- Pattern matching is a bottleneck in your application
- Mixed CRUD operations with pattern queries
Not recommended for:
- Very large datasets (multi-GB text columns)
- Infrequent pattern matching queries
- Exact match lookups (use regular B-tree index)
- Limited memory environments
Bulk Operations
For bulk inserts/updates/deletes, temporarily disable triggers:
SELECT biscuit_disable_triggers();
-- Perform bulk operations
COPY users FROM 'data.csv';
UPDATE users SET email = lower(email);
DELETE FROM users WHERE inactive = true;
-- Rebuild and re-enable
SELECT biscuit_build_index('users', 'email', 'id');
SELECT biscuit_enable_triggers();
Memory Usage
Monitor memory consumption:
SELECT biscuit_index_status(); -- Shows memory usage
Memory usage depends on:
- Number of records
- String length distribution
- Character diversity
- Number of unique lengths
Tombstone Management
Monitor tombstone accumulation:
-- Check tombstone count
SELECT biscuit_get_tombstone_count();
-- Manual cleanup if needed
SELECT biscuit_cleanup();
Automatic cleanup triggers when tombstone count reaches threshold (default: 1000).
Architecture
Index Structure
BISCUIT maintains several bitmap indices:
- Position index: Character presence at each position (forward)
- Negative position index: Character presence from end (backward)
- Length index: Records grouped by string length
- Character cache: All positions where each character appears
- Primary key hash table: O(1) lookup from PK to index position
Lazy Deletion
The lazy deletion system:
- Marks deleted records in a tombstone bitmap
- Filters tombstones during query execution
- Batches actual cleanup operations
- Reuses deleted slots for new inserts
Query Optimization
Pattern matching is optimized through:
- Fast path for simple patterns (single character, prefix, suffix)
- Length filtering to reduce candidate set
- Windowed position matching for complex patterns
- Bitmap intersection/union operations
Limitations and Caveats
- Memory requirement: Entire index must fit in RAM
- Session scope: Index is not persistent across PostgreSQL restarts
- Single column: One index per column (no composite indices)
- No crash recovery: Index must be rebuilt after crashes
- Character limit: Strings longer than 256 characters are truncated
- No case-insensitive: Pattern matching is case-sensitive
- Primary key required: Must have a unique primary key column
Troubleshooting
Index not found
ERROR: No index found
Solution: Build the index first:
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
Out of memory
ERROR: out of memory
Solution: Dataset too large for available memory. Consider:
- Increasing PostgreSQL memory limits
- Indexing a subset of data
- Using partitioned tables
Slow queries after many deletes
Solution: High tombstone count may slow queries. Trigger cleanup:
SELECT biscuit_cleanup();
Trigger conflicts
Solution: If you have other triggers, ensure proper ordering or disable BISCUIT triggers during maintenance windows.
License
BISCUIT is released under the PostgreSQL License.
Version
Current version: 1.0.6-Biscuit
Use SELECT biscuit_version(); to check your installed version.
Summary
BISCUIT makes wildcard queries faster by trading memory and build time for query performance.
It’s not magic—it’s a bitmap index that:
- Pre-computes character positions
- Uses fast bitmap operations instead of string scanning
- Still requires O(n) time to return n results
- Requires memory proportional to your data size
Contributors
BISCUIT is developed and maintained by Sivaprasad Murali .
Support and Contact
Issues: https://github.com/crystallinecore/biscuit/issues
Discussions: https://github.com/crystallinecore/biscuit/discussions
When pg_trgm feels half-baked, grab a BISCUIT 🍪