Extensions
README
Contents
- BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
        
- PostgreSQL Extension for Wildcard Pattern Matching
 - Executive Summary
 - Technical Overview
 - Installation and Requirements
 - Configuration and Setup
 - Query Interface
 - API Reference
 - Pattern Matching Specification
 - CRUD Operations
 - Monitoring and Diagnostics
 - CRUD Operations and Index Maintenance
 - Limitations and Constraints
 - Troubleshooting
 - Performance Expectations
 - API Reference
 - Performance Expectations
 - License
 - Version History
 - Contributors
 - Honest Summary
 
 
BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
PostgreSQL Extension for Wildcard Pattern Matching
Version 1.0.3
Executive Summary
BISCUIT is a PostgreSQL C extension that provides bitmap indexing for wildcard pattern matching operations. The extension addresses performance limitations in traditional database indexes for leading wildcard queries (e.g., LIKE '%pattern'), which typically require full table scans.
Key Capabilities
- Optimized Wildcard Matching: Bitmap operations for pattern matching regardless of wildcard placement
 - Automatic Index Maintenance: Transparent CRUD synchronization via PostgreSQL triggers
 - Memory-Efficient Storage: Compressed bitmap structures using Roaring Bitmap algorithm
 - Lazy Deletion Strategy: Deferred cleanup with batching for optimal throughput
 - Incremental Update Optimization: Minimized reindexing overhead for similar string modifications
 - Universal Primary Key Support: Compatible with any PostgreSQL data type
 - Production-Ready Design: Comprehensive error handling and diagnostic capabilities
 
Honest Performance Characteristics
| Operation | Complexity | Notes | |———–|———–|—––| | Simple Pattern Match (bitmap ops) | O(k) | k = matching records to filter | | Complex Pattern (multi-part) | O(k × m) | k = parts, m = window size | | Result Retrieval | O(n) | n = matching records returned | | Insert Operation | O(L) | L = string length (max 256 indexed) | | Update (Incremental) | O(d) | d = character differences | | Update (Full) | O(L) | L = new string length | | Delete Operation | O(1) | Lazy tombstone marking | | PK Lookup | O(1) | Hash table lookup | | Cleanup Operation | O(t × c) | t = tombstones, c = charset operations |
Important: Query time includes O(n) for returning n matching records. The bitmap operations are fast, but you still need to retrieve and return the actual data.
Technical Overview
Problem Statement
Traditional database indexes (B-tree, GiST, GIN) exhibit suboptimal performance for wildcard pattern matching queries, particularly when wildcards appear at the beginning of patterns. A query such as SELECT * FROM table WHERE column LIKE '%pattern%' typically requires a sequential scan.
Solution Architecture
BISCUIT implements a multi-dimensional bitmap index structure that decomposes strings into positional character mappings. Each character at each position is tracked using compressed bitmaps (Roaring Bitmaps), enabling efficient intersection operations for pattern matching.
Core Principle
Instead of scanning strings sequentially, BISCUIT:
- Pre-indexes all character positions (up to 256 characters)
 - Uses bitmap intersections to find candidate records
 - Filters results through tombstone bitmap (for deleted records)
 - Returns matching records
 
Reality Check: This trades initial indexing time and memory for faster queries. It’s not magic—you’re paying upfront to make searches faster later.
Core Index Structures
- 
Positional Character Index: Maps each character to position-bitmap pairs
- Forward positions: 
{0, 1, 2, ..., 255}from string start - Reverse positions: 
{-1, -2, -3, ..., -256}from string end 
 - Forward positions: 
 - 
Character Existence Cache: Union of all positional bitmaps per character for containment queries
 - 
Length Index: Bitmap array indexed by string length
 - 
Primary Key Hash Table: O(1) lookup structure mapping primary keys to internal indices
 - 
Lazy Deletion Layer: Tombstone bitmap tracking deleted records with deferred cleanup
 
Installation and Requirements
System Requirements
- PostgreSQL Version: 11.0 or higher
 - Operating System: Linux, macOS, Windows (with MinGW)
 - Architecture: x86_64, ARM64
 - Memory: Minimum 512MB available RAM per index
 - Compiler: GCC 4.8+, Clang 3.9+, or MSVC 2017+
 
Optional Dependencies
- CRoaring Library: For optimized Roaring Bitmap operations (recommended)
- Installation: 
apt-get install libroaring-dev(Debian/Ubuntu) - Falls back to built-in implementation if unavailable
 
 - Installation: 
 
Installation Procedure
Method 1: PostgreSQL Extension System
CREATE EXTENSION biscuit;
Method 2: Manual Compilation
git clone https://github.com/crystallinecore/biscuit.git
cd biscuit
make
sudo make install
psql -d your_database -c "CREATE EXTENSION biscuit;"
Verification
SELECT biscuit_version();
-- Expected output: 1.0.3-Biscuit
Configuration and Setup
Quick Start (Recommended)
SELECT biscuit_setup('customer_records', 'email_address', 'customer_id');
This single function:
- Builds the bitmap index
 - Creates type-safe wrapper functions
 - Installs automatic update triggers
 
Manual Configuration (Advanced)
-- Step 1: Build index
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
-- Step 2: Create query functions
SELECT biscuit_create_match_function();
-- Step 3: Enable triggers
SELECT biscuit_enable_triggers();
Important Limitations
- One Index Per Database Instance: Currently only supports one active index
 - In-Memory Only: Index is not persistent across server restarts
 - 256 Character Limit: Only first 256 characters are indexed (though full strings are stored)
 - Case Sensitive: Patterns are case-sensitive by default
 
Query Interface
Primary Query Functions
1. Full Record Retrieval
SELECT * FROM biscuit_match('%@example.com%');
SELECT id, email FROM biscuit_match('user%') WHERE created_at > '2024-01-01';
Returns complete rows from the indexed table.
2. Count Only (Fastest)
SELECT biscuit_match_count('%search_term%');
Returns count without materializing result set.
3. Key-Value Pairs
SELECT * FROM biscuit_match_rows('%pattern%');
Returns (primary_key, indexed_value) pairs.
API Reference
Setup and Configuration Functions
biscuit_setup()
Complete one-step setup function that builds index, creates wrapper functions, and enables triggers.
Signature:
biscuit_setup(
    p_table_name TEXT,
    p_column_name TEXT,
    p_pk_column_name TEXT DEFAULT 'id'
) RETURNS TEXT
Parameters:
p_table_name: Name of the table to indexp_column_name: Name of the text column to indexp_pk_column_name: Name of the primary key column (default: ‘id’)
Returns: Status message describing operations performed
Usage:
-- Basic setup with default primary key
SELECT biscuit_setup('products', 'name');
-- Setup with custom primary key
SELECT biscuit_setup('customers', 'email', 'customer_id');
-- Setup with UUID primary key
SELECT biscuit_setup('users', 'username', 'user_uuid');
Example Output:
Biscuit index built successfully.
Created biscuit_match() and biscuit_match_rows() functions for table: products
Columns: id integer, name text
Successfully created trigger on table: products
The index will now automatically update on INSERT, UPDATE, and DELETE operations.
biscuit_build_index()
Constructs the bitmap index structure from existing table data.
Signature:
biscuit_build_index(
    table_name TEXT,
    column_name TEXT,
    pk_column_name TEXT DEFAULT 'id'
) RETURNS BOOLEAN
Parameters:
table_name: Name of the table to indexcolumn_name: Name of the text column to indexpk_column_name: Name of the primary key column (default: ‘id’)
Returns: TRUE on success
Usage:
-- Build index for products table
SELECT biscuit_build_index('products', 'name', 'id');
-- Rebuild existing index (drops old index first)
SELECT biscuit_build_index('products', 'name', 'id');
-- Build index with custom primary key
SELECT biscuit_build_index('customers', 'email', 'customer_id');
Performance Note: For 1 million records, expect build time of approximately 60-90 seconds depending on string length and complexity.
biscuit_create_match_function()
Generates strongly-typed wrapper functions for querying.
Signature:
biscuit_create_match_function() RETURNS TEXT
Returns: Confirmation message with function signatures
Usage:
-- Must call after building index
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_create_match_function();
Generated Functions:
biscuit_match(pattern TEXT): Returns complete table rowsbiscuit_match_rows(pattern TEXT): Returns (pk, value) tuples
biscuit_enable_triggers()
Activates automatic index maintenance triggers.
Signature:
biscuit_enable_triggers() RETURNS TEXT
Returns: Status message
Usage:
-- Enable automatic updates
SELECT biscuit_enable_triggers();
-- Typical workflow
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_create_match_function();
SELECT biscuit_enable_triggers();
Creates Trigger: biscuit_auto_update (AFTER INSERT OR UPDATE OR DELETE)
biscuit_disable_triggers()
Deactivates automatic index maintenance for bulk operations.
Signature:
biscuit_disable_triggers() RETURNS TEXT
Returns: Status message
Usage:
-- Disable for bulk operation
SELECT biscuit_disable_triggers();
-- Perform bulk insert
COPY products FROM '/data/products.csv' CSV;
-- Rebuild and re-enable
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Query Functions
biscuit_match()
Primary query interface returning complete table rows with proper type safety.
Signature:
biscuit_match(pattern TEXT) RETURNS SETOF table_type
Parameters:
pattern: SQL LIKE-style wildcard pattern (% and _)
Returns: Complete rows from indexed table matching the pattern
Usage:
-- Simple containment search
SELECT * FROM biscuit_match('%laptop%');
-- Prefix search
SELECT * FROM biscuit_match('PROD-%');
-- Suffix search
SELECT * FROM biscuit_match('%@gmail.com');
-- With additional filters
SELECT * FROM biscuit_match('%electronics%')
WHERE price > 100 AND in_stock = true;
-- With ordering
SELECT * FROM biscuit_match('%widget%')
ORDER BY created_at DESC
LIMIT 10;
-- In joins
SELECT p.*, c.category_name
FROM biscuit_match('%special%') p
JOIN categories c ON p.category_id = c.id;
-- Count matches
SELECT COUNT(*) FROM biscuit_match('%premium%');
-- Aggregate operations
SELECT category, AVG(price)
FROM biscuit_match('%electronics%')
GROUP BY category;
biscuit_match_count()
Returns count of matching records without materializing result set. This is the fastest query method.
Signature:
biscuit_match_count(pattern TEXT) RETURNS INTEGER
Parameters:
pattern: SQL LIKE-style wildcard pattern
Returns: Integer count of matching records
Usage:
-- Get count only
SELECT biscuit_match_count('%@example.com');
-- Compare counts
SELECT 
    biscuit_match_count('%@gmail.com') AS gmail_users,
    biscuit_match_count('%@yahoo.com') AS yahoo_users,
    biscuit_match_count('%@hotmail.com') AS hotmail_users;
-- Conditional logic
DO $
DECLARE
    match_count INTEGER;
BEGIN
    match_count := biscuit_match_count('%error%');
    IF match_count > 1000 THEN
        RAISE NOTICE 'High error count: %', match_count;
    END IF;
END $;
-- Use in WHERE clause
SELECT * FROM summary_table
WHERE error_count = biscuit_match_count('%critical%');
Performance: Performs only bitmap operations without tuple retrieval, making it significantly faster than COUNT(*) on full result set.
biscuit_match_rows()
Returns (primary_key, indexed_value) pairs for matching records.
Signature:
biscuit_match_rows(pattern TEXT) RETURNS TABLE(pk primary_key_type, value TEXT)
Parameters:
pattern: SQL LIKE-style wildcard pattern
Returns: Table with two columns: pk (original type) and value (text)
Usage:
-- Get key-value pairs
SELECT * FROM biscuit_match_rows('%search%');
-- Extract just primary keys
SELECT pk FROM biscuit_match_rows('%pattern%');
-- Join back to original table for specific columns
SELECT t.id, t.name, t.price
FROM products t
WHERE t.id IN (SELECT pk FROM biscuit_match_rows('%laptop%'));
-- Use in subquery
WITH matches AS (
    SELECT pk, value FROM biscuit_match_rows('%error%')
)
SELECT * FROM logs l
JOIN matches m ON l.id = m.pk
WHERE l.timestamp > NOW() - INTERVAL '1 hour';
biscuit_match_keys()
Low-level function returning primary keys and values as text. Used internally by higher-level functions.
Signature:
biscuit_match_keys(pattern TEXT) RETURNS TABLE(pk TEXT, value TEXT)
Parameters:
pattern: SQL LIKE-style wildcard pattern
Returns: Table with two text columns
Usage:
-- Direct usage (returns everything as text)
SELECT * FROM biscuit_match_keys('%pattern%');
-- Cast primary key back to original type
SELECT pk::INTEGER AS id, value
FROM biscuit_match_keys('%search%');
-- For UUID primary keys
SELECT pk::UUID AS user_id, value
FROM biscuit_match_keys('%@domain.com');
Note: Generally prefer biscuit_match_rows() which preserves primary key type.
Monitoring and Status Functions
biscuit_index_status()
Comprehensive status report including table information, memory usage, and CRUD statistics.
Signature:
biscuit_index_status() RETURNS TEXT
Returns: Formatted multi-line status report
Usage:
-- View complete status
SELECT biscuit_index_status();
-- Extract specific metrics using regex
SELECT (regexp_matches(biscuit_index_status(), 'Active Records: ([0-9]+)'))[1]::INTEGER AS active;
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC AS memory_mb;
-- Create monitoring view
CREATE VIEW biscuit_health AS
SELECT 
    NOW() AS checked_at,
    (regexp_matches(biscuit_index_status(), 'Active Records: ([0-9]+)'))[1]::INTEGER AS active_records,
    (regexp_matches(biscuit_index_status(), 'Tombstoned Slots: ([0-9]+)'))[1]::INTEGER AS tombstones,
    (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC AS memory_mb;
-- Check status in script
DO $
DECLARE
    status_text TEXT;
BEGIN
    status_text := biscuit_index_status();
    RAISE NOTICE '%', status_text;
END $;
Sample Output:
========================================
Biscuit Index v3 - FIXED
========================================
Table: products
Column: name
Primary Key: id
Active Records: 998567
Total Slots: 1000000
Free Slots: 1433
Tombstoned Slots: 856
Max length: 128
Memory: 18.3 MB
----------------------------------------
CRUD Statistics:
  Inserts: 1000000
  Deletes: 1433
  Updates: 125678 (incr: 89234, 71.0%)
  Queries: 45892
----------------------------------------
Lazy Deletion Status:
  Pending tombstones: 856 (59.7% of deletes)
  Cleanup threshold: 1000
  Total cleanups: 1
  Items cleaned: 577
  Query/Delete ratio: 32.02
----------------------------------------
biscuit_get_active_count()
Returns the number of active (non-deleted) records in the index.
Signature:
biscuit_get_active_count() RETURNS INTEGER
Returns: Count of active records
Usage:
-- Get active count
SELECT biscuit_get_active_count();
-- Compare with table
SELECT 
    COUNT(*) AS table_count,
    biscuit_get_active_count() AS index_count
FROM products;
-- Monitor growth
CREATE TABLE index_metrics (
    logged_at TIMESTAMPTZ DEFAULT NOW(),
    active_count INTEGER
);
INSERT INTO index_metrics (active_count)
SELECT biscuit_get_active_count();
-- Alert on discrepancy
DO $
DECLARE
    table_count BIGINT;
    index_count INTEGER;
BEGIN
    SELECT COUNT(*) INTO table_count FROM products;
    index_count := biscuit_get_active_count();
    
    IF table_count != index_count THEN
        RAISE WARNING 'Index out of sync: table=%, index=%', table_count, index_count;
    END IF;
END $;
biscuit_get_free_slots()
Returns the number of available slots for reuse (from deleted records).
Signature:
biscuit_get_free_slots() RETURNS INTEGER
Returns: Count of free slots
Usage:
-- Check available slots
SELECT biscuit_get_free_slots();
-- Monitor slot reuse efficiency
SELECT 
    biscuit_get_active_count() AS active,
    biscuit_get_free_slots() AS free,
    ROUND(100.0 * biscuit_get_free_slots() / 
          (biscuit_get_active_count() + biscuit_get_free_slots()), 2) AS free_pct;
-- Capacity planning
WITH stats AS (
    SELECT 
        biscuit_get_active_count() AS active,
        biscuit_get_free_slots() AS free,
        (regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER AS capacity
)
SELECT 
    active,
    free,
    capacity,
    capacity - active - free AS used_tombstoned,
    ROUND(100.0 * active / capacity, 2) AS utilization_pct
FROM stats;
biscuit_get_tombstone_count()
Returns the number of deleted records pending cleanup.
Signature:
biscuit_get_tombstone_count() RETURNS INTEGER
Returns: Count of pending tombstones
Usage:
-- Check tombstone count
SELECT biscuit_get_tombstone_count();
-- Conditional cleanup
DO $
BEGIN
    IF biscuit_get_tombstone_count() > 5000 THEN
        PERFORM biscuit_cleanup();
        RAISE NOTICE 'Cleanup completed';
    END IF;
END $;
-- Monitor cleanup efficiency
SELECT 
    biscuit_get_tombstone_count() AS pending,
    (regexp_matches(biscuit_index_status(), 'Total cleanups: ([0-9]+)'))[1]::BIGINT AS total_cleanups,
    (regexp_matches(biscuit_index_status(), 'Items cleaned: ([0-9]+)'))[1]::BIGINT AS items_cleaned;
-- Alert threshold
CREATE OR REPLACE FUNCTION check_tombstone_threshold()
RETURNS VOID AS $
DECLARE
    count INTEGER;
BEGIN
    count := biscuit_get_tombstone_count();
    IF count > 10000 THEN
        RAISE WARNING 'CRITICAL: % tombstones pending cleanup', count;
    ELSIF count > 5000 THEN
        RAISE NOTICE 'WARNING: % tombstones pending cleanup', count;
    END IF;
END;
$ LANGUAGE plpgsql;
biscuit_version()
Returns the version string of the BISCUIT extension.
Signature:
biscuit_version() RETURNS TEXT
Returns: Version string
Usage:
-- Check version
SELECT biscuit_version();
-- Verify installation
DO $
DECLARE
    version TEXT;
BEGIN
    version := biscuit_version();
    RAISE NOTICE 'BISCUIT version: %', version;
END $;
-- Version check in application
SELECT biscuit_version() AS version;
-- Expected: 1.0.3-Biscuit
Maintenance Functions
biscuit_cleanup()
Manually triggers tombstone cleanup process.
Signature:
biscuit_cleanup() RETURNS TEXT
Returns: Summary of cleanup operations
Usage:
-- Manual cleanup
SELECT biscuit_cleanup();
-- Scheduled cleanup (using pg_cron)
SELECT cron.schedule(
    'biscuit-nightly-cleanup',
    '0 2 * * *',  -- 2 AM daily
    $SELECT biscuit_cleanup()$
);
-- Cleanup before performance-critical operation
BEGIN;
    SELECT biscuit_cleanup();
    -- Perform critical queries
COMMIT;
-- Conditional cleanup
DO $
DECLARE
    tombstone_count INTEGER;
    cleanup_result TEXT;
BEGIN
    tombstone_count := biscuit_get_tombstone_count();
    
    IF tombstone_count > 1000 THEN
        cleanup_result := biscuit_cleanup();
        RAISE NOTICE '%', cleanup_result;
    ELSE
        RAISE NOTICE 'Cleanup skipped: only % tombstones', tombstone_count;
    END IF;
END $;
Example Output:
Tombstone cleanup complete:
  Cleaned: 856 tombstones
  Remaining: 0
  Total cleanups: 2
When to Use:
- Before performance-critical queries
 - After bulk delete operations
 - When tombstone count exceeds 5,000
 - During scheduled maintenance windows
 
Pattern Matching Specification
Wildcard Operators
| Operator | Matches | Examples |
|–––––|———|–––––|
| % | Zero or more characters | 'a%' → “a”, “abc” |
| _ | Exactly one character | 'a_c' → “abc”, “axc” |
Pattern Examples
-- Exact match
SELECT * FROM biscuit_match('hello');
-- Prefix
SELECT * FROM biscuit_match('user_%');
-- Suffix
SELECT * FROM biscuit_match('%@gmail.com');
-- Contains
SELECT * FROM biscuit_match('%error%');
-- Complex multi-part
SELECT * FROM biscuit_match('%user_%@%.com');
-- Length-specific
SELECT * FROM biscuit_match('___');  -- Exactly 3 chars
Performance Guidance
Optimal Patterns (simple bitmap operations):
- Single character: 
'%a%' - Prefix: 
'abc%' - Suffix: 
'%xyz' - Exact length: 
'___' 
Acceptable Patterns (windowed matching):
- Two parts: 
'%a%b%' - Three parts: 
'%a%b%c%' 
Avoid When Possible (slower performance):
- Four or more parts: 
'%a%b%c%d%' - Recommendation: Simplify patterns or combine terms
 
CRUD Operations
Automatic Index Maintenance
Once triggers are enabled, all DML operations automatically update the index:
-- Insert: O(L) where L = string length
INSERT INTO products (id, name) VALUES (1, 'New Product');
-- Update: O(d) for incremental, O(L) for full reindex
UPDATE products SET name = 'Updated Name' WHERE id = 1;
-- Delete: O(1) lazy deletion
DELETE FROM products WHERE id = 1;
Incremental Update Optimization
For updates where:
- Same string length
 - String length ≥ 3 characters
 - < 20% character changes
 - Maximum 3 changed characters
 
BISCUIT uses incremental updates (only updating changed positions).
Lazy Deletion
Deletes are O(1) operations that:
- Mark record in tombstone bitmap
 - Remove from hash table
 - Add to free list for reuse
 - Defer cleanup of position bitmaps
 
Cleanup triggers automatically when tombstone count reaches threshold (default: 1000).
Bulk Operations
For large bulk operations, consider:
-- Disable triggers
SELECT biscuit_disable_triggers();
-- Perform bulk operation
COPY products FROM '/data/products.csv' CSV;
-- Rebuild and re-enable
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Monitoring and Diagnostics
Check Index Status
SELECT biscuit_index_status();
Sample output:
========================================
Biscuit Index v3 - FIXED
========================================
Table: products
Column: name
Primary Key: id
Active Records: 998567
Total Slots: 1000000
Free Slots: 1433
Tombstoned Slots: 856
Max length: 128
Memory: 18.3 MB
----------------------------------------
CRUD Statistics:
  Inserts: 1000000
  Deletes: 1433
  Updates: 125678 (incr: 89234, 71.0%)
  Queries: 45892
----------------------------------------
Key Metrics
-- Active record count
SELECT biscuit_get_active_count();
-- Available free slots
SELECT biscuit_get_free_slots();
-- Pending tombstones
SELECT biscuit_get_tombstone_count();
Manual Cleanup
SELECT biscuit_cleanup();
Triggers immediate tombstone cleanup. Use when:
- Tombstone count exceeds 5000
 - Before performance-critical operations
 - During maintenance windows
 
CRUD Operations and Index Maintenance
Automatic Index Maintenance
Once triggers are enabled via biscuit_setup() or biscuit_enable_triggers(), all DML operations automatically maintain index consistency.
INSERT Operations
Behavior: New records are immediately indexed and available for queries.
Complexity: O(L) where L = string length (up to 256 characters indexed)
Usage:
-- Simple insert
INSERT INTO products (id, name) VALUES (1001, 'New Product');
-- Insert with immediate availability
INSERT INTO products (id, name) VALUES (1002, 'Premium Widget');
SELECT * FROM biscuit_match('%Widget%');  -- Returns new row immediately
-- Bulk insert (keep triggers enabled for small batches)
INSERT INTO products (id, name)
VALUES 
    (1003, 'Product A'),
    (1004, 'Product B'),
    (1005, 'Product C');
-- Bulk insert (disable triggers for large batches)
SELECT biscuit_disable_triggers();
COPY products FROM '/data/products.csv' CSV;
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
UPDATE Operations
Behavior: Updates are optimized based on string similarity.
Incremental Update Criteria:
- Same string length (before and after)
 - String length ≥ 3 characters
 - Character differences < 20% of string length
 - Maximum 3 changed characters
 
Complexity:
- Incremental: O(d) where d = number of different characters
 - Full reindex: O(L) where L = new string length
 
Usage:
-- Incremental update (same length, few changes)
UPDATE products 
SET name = 'Premium Widget Pro'  -- Similar to 'Premium Widget Max'
WHERE id = 1001;
-- Full reindex (different length)
UPDATE products 
SET name = 'Completely Different Name'
WHERE id = 1002;
-- Bulk update (disable triggers)
SELECT biscuit_disable_triggers();
UPDATE products SET name = UPPER(name);
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
-- Update with immediate query
UPDATE products SET name = 'Updated Product' WHERE id = 1003;
SELECT * FROM biscuit_match('%Updated%');  -- Sees updated value
Statistics:
-- Check incremental update efficiency
SELECT biscuit_index_status();
-- Look for: Updates: 125678 (incr: 89234, 71.0%)
DELETE Operations
Behavior: Implements lazy deletion with O(1) tombstone marking.
Complexity: O(1) for delete operation, O(t × c) for cleanup
Usage:
-- Single delete (instant)
DELETE FROM products WHERE id = 1001;
-- Bulk delete
DELETE FROM products WHERE category = 'obsolete';
-- Delete with immediate query
DELETE FROM products WHERE id = 1002;
SELECT * FROM biscuit_match('%Product%');  -- Does not include deleted record
-- Check tombstone accumulation
SELECT biscuit_get_tombstone_count();
-- Manual cleanup
SELECT biscuit_cleanup();
-- Bulk delete with manual cleanup
SELECT biscuit_disable_triggers();
DELETE FROM products WHERE created_at < '2020-01-01';
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Automatic Cleanup: Triggers when tombstone count reaches threshold (default: 1000)
Transaction Semantics
ACID Compliance: Index updates commit/rollback with transactions.
Usage:
-- Transaction rollback
BEGIN;
    INSERT INTO products (id, name) VALUES (9999, 'Test Product');
    SELECT * FROM biscuit_match('%Test%');  -- Visible in transaction
ROLLBACK;
SELECT * FROM biscuit_match('%Test%');  -- Not visible, index reverted
-- Transaction commit
BEGIN;
    INSERT INTO products (id, name) VALUES (9999, 'Test Product');
    UPDATE products SET name = 'Test Product Updated' WHERE id = 9999;
    DELETE FROM products WHERE id = 9998;
COMMIT;
-- All changes now visible
SELECT * FROM biscuit_match('%Test%');
Limitations and Constraints
Architectural Limitations
1. Single Index Per Database Instance
Limitation: Only one BISCUIT index can be active per PostgreSQL instance.
Impact:
-- This replaces the first index
SELECT biscuit_setup('table1', 'col1', 'id');
SELECT biscuit_setup('table2', 'col2', 'id');  -- Replaces table1 index
Workaround:
-- Use separate databases
CREATE DATABASE db1;
\c db1
SELECT biscuit_setup('table', 'column', 'id');
CREATE DATABASE db2;
\c db2
SELECT biscuit_setup('table', 'column', 'id');
2. Non-Persistent Storage
Limitation: Index resides in shared memory and is lost on server restart.
Impact: Must rebuild after PostgreSQL restart.
Workaround:
-- Create startup script: /etc/postgresql/init_biscuit.sql
\c your_database
SELECT biscuit_setup('products', 'name', 'id');
SELECT biscuit_setup('customers', 'email', 'customer_id');  -- If in separate DB
-- Or use application startup code
-- On application initialization:
-- execute: SELECT biscuit_setup('table', 'column', 'pk');
3. 256 Character Indexing Limit
Limitation: Only first 256 characters are indexed for pattern matching.
Impact:
-- This works (match at position 10)
INSERT INTO products (id, name) VALUES (1, 'Product: Special Widget');
SELECT * FROM biscuit_match('%Special%');  -- Found
-- This fails (match at position 300)
INSERT INTO products (id, description) 
VALUES (2, 'Long description... [300 chars] ...Special keyword');
SELECT * FROM biscuit_match('%Special%');  -- Not found (beyond position 256)
Note: Full strings are still stored and returned, just not indexed beyond 256 characters.
4. Single Column Per Index
Limitation: One index covers one text column per table.
Workaround - Concatenated Search Column:
-- Create combined search column
ALTER TABLE products 
ADD COLUMN search_text TEXT 
GENERATED ALWAYS AS (name || ' ' || description || ' ' || sku) STORED;
-- Index the combined column
SELECT biscuit_setup('products', 'search_text', 'id');
-- Search across all fields
SELECT * FROM biscuit_match('%widget%');
5. Case Sensitivity
Limitation: Patterns are case-sensitive by default.
Impact:
INSERT INTO products (id, name) VALUES (1, 'Premium Widget');
SELECT * FROM biscuit_match('%widget%');   -- Not found
SELECT * FROM biscuit_match('%Widget%');   -- Found
Workaround - Normalized Column:
-- Add normalized column
ALTER TABLE products 
ADD COLUMN name_lower TEXT 
GENERATED ALWAYS AS (LOWER(name)) STORED;
-- Index normalized column
SELECT biscuit_setup('products', 'name_lower', 'id');
-- Case-insensitive search
SELECT * FROM biscuit_match(LOWER('%WIDGET%'));  -- Found
Performance Limitations
1. Memory Usage
Limitation: Memory usage scales with dataset size and characteristics.
Typical Usage:
- 10-30% of indexed data size
 - Example: 1M records × 50 chars avg ≈ 15-25 MB
 
Monitoring:
-- Check memory usage
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC;
-- Monitor over time
CREATE TABLE memory_tracking (
    logged_at TIMESTAMPTZ DEFAULT NOW(),
    memory_mb NUMERIC
);
INSERT INTO memory_tracking (memory_mb)
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC;
2. Pattern Complexity
Limitation: Query performance degrades with pattern complexity.
Performance Tiers:
-- Fast
SELECT * FROM biscuit_match('%single%');
-- Good 
SELECT * FROM biscuit_match('%two%parts%');
-- Acceptable
SELECT * FROM biscuit_match('%three%part%pattern%');
-- Slower 
SELECT * FROM biscuit_match('%four%or%more%parts%here%');
Recommendation: Keep patterns to 2-3 parts for optimal performance.
3. Write Concurrency
Limitation: Trigger-based updates may experience contention under very high write loads.
Impact: Write operations serialize at the row level through PostgreSQL’s trigger mechanism.
Mitigation:
-- For high-volume writes, batch operations
BEGIN;
    INSERT INTO products SELECT * FROM staging_table;
COMMIT;
-- Or disable triggers for bulk operations
SELECT biscuit_disable_triggers();
-- Perform bulk writes
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Known Issues
- No Regular Expression Support: Only SQL LIKE wildcards (
%and_) - No Full-Text Search Features: No stemming, stop words, or language processing
 - No Cross-Database Queries: Each database has its own index
 - No Partial String Updates: Updates always reprocess entire string
 
Troubleshooting
Common Issues and Solutions
Issue 1: “Index not built” Error
Symptom:
SELECT * FROM biscuit_match('%pattern%');
ERROR: Index not built. Call biscuit_build_index() first.
Diagnosis:
-- Check if index exists
SELECT biscuit_version();  -- Should return version if extension loaded
-- Attempt to view status
SELECT biscuit_index_status();  -- Will error if no index
Solution:
-- Method 1: Complete setup
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
-- Method 2: Manual build
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_create_match_function();
SELECT biscuit_enable_triggers();
Prevention:
# Add to PostgreSQL startup script
# File: /etc/postgresql/14/main/postgresql.conf
# Or application startup routine
-- Startup SQL script
\c your_database
SELECT biscuit_setup('products', 'name', 'id');
Issue 2: Query Returns No Results
Symptom:
SELECT biscuit_match_count('%known_value%');
-- Returns: 0 (expected > 0)
Diagnosis:
-- Verify value exists in table
SELECT COUNT(*) FROM products WHERE name LIKE '%known_value%';
-- Check index status
SELECT biscuit_index_status();
-- Verify active records
SELECT biscuit_get_active_count();
-- Compare with table count
SELECT 
    COUNT(*) AS table_count,
    biscuit_get_active_count() AS index_count
FROM products;
-- Check trigger status
SELECT tgname, tgenabled 
FROM pg_trigger 
WHERE tgname = 'biscuit_auto_update'
AND tgrelid = 'products'::regclass;
Common Causes:
A. Index Out of Sync
-- Solution: Rebuild index
SELECT biscuit_build_index('products', 'name', 'id');
-- Verify sync
SELECT 
    COUNT(*) AS table_count,
    biscuit_get_active_count() AS index_count
FROM products;
B. Case Sensitivity Mismatch
-- Problem: Pattern case doesn't match data
INSERT INTO products (id, name) VALUES (1, 'Widget');
SELECT * FROM biscuit_match('%widget%');  -- Returns nothing
-- Solution 1: Match case exactly
SELECT * FROM biscuit_match('%Widget%');
-- Solution 2: Use normalized column
ALTER TABLE products 
ADD COLUMN name_lower TEXT 
GENERATED ALWAYS AS (LOWER(name)) STORED;
SELECT biscuit_setup('products', 'name_lower', 'id');
SELECT * FROM biscuit_match(LOWER('%WIDGET%'));
C. Pattern Beyond 256 Characters
-- Diagnosis: Check string length
SELECT id, LENGTH(name) FROM products WHERE id = 123;
-- Problem: Match occurs after position 256
SELECT * FROM biscuit_match('%rare_keyword%');  -- Not found
-- Solution: Ensure critical terms are within first 256 characters
-- Or restructure data to prioritize important terms
D. Triggers Disabled
-- Check trigger status
SELECT tgname, tgenabled FROM pg_trigger WHERE tgname = 'biscuit_auto_update';
-- Solution: Re-enable triggers
SELECT biscuit_enable_triggers();
-- Rebuild to sync
SELECT biscuit_build_index('products', 'name', 'id');
Issue 3: Slow Query Performance
Symptom:
\timing on
SELECT * FROM biscuit_match('%pattern%');
Time: 850.234 ms  -- Expected: < 50ms
\timing off
Diagnosis:
-- Check tombstone count
SELECT biscuit_get_tombstone_count();
-- If > 5000: Cleanup needed
-- Count pattern parts (% symbols)
-- Pattern: '%a%b%c%d%' has 5 parts (too many)
-- Check result set size
SELECT biscuit_match_count('%pattern%');
-- If > 50% of table: Poor selectivity
-- Check memory usage
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC;
-- View full statistics
SELECT biscuit_index_status();
Solutions:
A. High Tombstone Count
-- Problem: Excessive tombstones slow queries
SELECT biscuit_get_tombstone_count();  -- Returns: 8543
-- Solution: Run cleanup
SELECT biscuit_cleanup();
-- Verify improvement
\timing on
SELECT * FROM biscuit_match('%pattern%');
\timing off
B. Complex Pattern
-- Problem: Too many pattern parts
SELECT * FROM biscuit_match('%a%b%c%d%e%');  -- 6 parts, slow
-- Solution: Simplify pattern
SELECT * FROM biscuit_match('%abc%de%');  -- 2 parts, faster
-- Or use multiple simpler queries
SELECT * FROM biscuit_match('%abc%') 
WHERE value LIKE '%de%' AND value LIKE '%f%';
C. Large Result Set
-- Problem: Returning too many results
SELECT biscuit_match_count('%a%');  -- Returns: 450000 (45% of table)
-- Solution: Add filters to reduce result set
SELECT * FROM biscuit_match('%a%')
WHERE created_at > NOW() - INTERVAL '30 days'
AND status = 'active'
LIMIT 1000;
-- Or reconsider if BISCUIT is appropriate for this query
D. Memory Fragmentation
-- Problem: High memory usage affecting performance
SELECT biscuit_index_status();
-- Memory: 250 MB (expected ~50MB for dataset)
-- Solution: Rebuild to compact
SELECT biscuit_build_index('products', 'name', 'id');
Issue 4: High Memory Usage
Symptom:
SELECT biscuit_index_status();
-- Memory: 500 MB (expected ~20 MB for dataset)
Diagnosis:
-- Check record count and characteristics
SELECT 
    COUNT(*) AS records,
    AVG(LENGTH(name)) AS avg_length,
    MAX(LENGTH(name)) AS max_length,
    COUNT(DISTINCT name) AS unique_values,
    COUNT(DISTINCT SUBSTRING(name, 1, 1)) AS unique_first_chars
FROM products;
-- Check tombstone accumulation
SELECT biscuit_get_tombstone_count();
-- Monitor memory over time
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC AS memory_mb;
Solutions:
A. Tombstone Accumulation
-- Problem: Tombstones not cleaned up
SELECT biscuit_get_tombstone_count();  -- Returns: 15000
-- Solution: Manual cleanup
SELECT biscuit_cleanup();
-- Check memory after cleanup
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC;
B. Memory Fragmentation
-- Solution: Rebuild to compact
SELECT biscuit_build_index('products', 'name', 'id');
C. Unexpected Data Characteristics
-- Problem: Very long strings or high diversity
SELECT MAX(LENGTH(name)) FROM products;  -- Returns: 2048
-- Solution: Truncate indexed data (only first 256 chars matter)
ALTER TABLE products ADD COLUMN name_indexed TEXT 
GENERATED ALWAYS AS (LEFT(name, 200)) STORED;
SELECT biscuit_setup('products', 'name_indexed', 'id');
Issue 5: Trigger Not Firing
Symptom:
INSERT INTO products (id, name) VALUES (999, 'test');
SELECT * FROM biscuit_match('%test%');
-- Returns: 0 rows (expected 1 row)
Diagnosis:
-- Check trigger existence and status
SELECT 
    tgname,
    tgenabled,
    tgisinternal,
    pg_get_triggerdef(oid) AS definition
FROM pg_trigger
WHERE tgrelid = 'products'::regclass
AND tgname LIKE '%biscuit%';
-- Check PostgreSQL logs for trigger errors
-- Location: /var/log/postgresql/postgresql-14-main.log
-- Test trigger manually
DO $
BEGIN
    PERFORM biscuit_trigger();
EXCEPTION WHEN OTHERS THEN
    RAISE NOTICE 'Trigger error: %', SQLERRM;
END $;
Solutions:
A. Trigger Disabled
-- Check status
SELECT tgenabled FROM pg_trigger 
WHERE tgname = 'biscuit_auto_update'
AND tgrelid = 'products'::regclass;
-- Returns: 'D' (disabled) or no rows
-- Solution: Enable trigger
SELECT biscuit_enable_triggers();
-- Verify
SELECT tgenabled FROM pg_trigger 
WHERE tgname = 'biscuit_auto_update'
AND tgrelid = 'products'::regclass;
-- Should return: 'O' (enabled)
B. Trigger Missing
-- Check if trigger exists
SELECT COUNT(*) FROM pg_trigger 
WHERE tgname = 'biscuit_auto_update'
AND tgrelid = 'products'::regclass;
-- Returns: 0
-- Solution: Create trigger
SELECT biscuit_enable_triggers();
-- Or complete setup
SELECT biscuit_setup('products', 'name', 'id');
C. Trigger Error
-- Check logs for errors like:
-- "NULL primary key in INSERT"
-- "Column not found"
-- Solution: Verify table structure
\d products
-- Ensure primary key is NOT NULL
ALTER TABLE products ALTER COLUMN id SET NOT NULL;
-- Rebuild with correct configuration
SELECT biscuit_setup('products', 'name', 'id');
Issue 6: “NULL primary key” Error
Symptom:
INSERT INTO products (name) VALUES ('test');
ERROR: NULL primary key in INSERT
Cause: Primary key column contains NULL value
Solution:
A. Add Default Value
-- For SERIAL columns
CREATE SEQUENCE products_id_seq;
ALTER TABLE products 
ALTER COLUMN id SET DEFAULT nextval('products_id_seq');
-- For IDENTITY columns (PostgreSQL 10+)
ALTER TABLE products 
ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY;
B. Ensure NOT NULL Constraint
-- Add NOT NULL constraint
ALTER TABLE products ALTER COLUMN id SET NOT NULL;
-- Verify constraint
\d products
C. Provide Explicit Values
-- Always specify primary key
INSERT INTO products (id, name) VALUES (1, 'test');
-- Or use nextval
INSERT INTO products (id, name) 
VALUES (nextval('products_id_seq'), 'test');
Issue 7: Out of Memory
Symptom:
SELECT biscuit_build_index('large_table', 'text_column', 'id');
ERROR: out of memory
DETAIL: Failed on request of size X
Diagnosis:
-- Check table size
SELECT 
    pg_size_pretty(pg_total_relation_size('large_table')) AS table_size,
    COUNT(*) AS row_count,
    AVG(LENGTH(text_column)) AS avg_length,
    MAX(LENGTH(text_column)) AS max_length
FROM large_table;
-- Check PostgreSQL memory settings
SHOW shared_buffers;
SHOW work_mem;
SHOW maintenance_work_mem;
-- Estimate memory requirement
-- Rough estimate: (row_count × avg_length × 0.2) bytes
Solutions:
A. Increase Memory Allocation
-- Edit postgresql.conf
-- /etc/postgresql/14/main/postgresql.conf
shared_buffers = 2GB          # Increase from default (128MB)
work_mem = 256MB              # Increase from default (4MB)
maintenance_work_mem = 1GB    # Increase from default (64MB)
# Restart PostgreSQL
sudo systemctl restart postgresql
B. Partition Large Tables
-- Create partitioned table
CREATE TABLE large_table_partitioned (
    id SERIAL,
    text_column TEXT,
    created_date DATE
) PARTITION BY RANGE (created_date);
-- Create partitions
CREATE TABLE large_table_2023 
PARTITION OF large_table_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE large_table_2024
PARTITION OF large_table_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Index each partition separately
SELECT biscuit_setup('large_table_2023', 'text_column', 'id');
-- Note: Only one partition can be indexed at a time
C. Reduce Data Complexity
-- Normalize/simplify data before indexing
UPDATE large_table 
SET text_column = LOWER(LEFT(text_column, 200));
-- Then build index
SELECT biscuit_build_index('large_table', 'text_column', 'id');
Issue 8: Index-Table Mismatch
Symptom:
SELECT COUNT(*) FROM products;
-- Returns: 100000
SELECT biscuit_get_active_count();
-- Returns: 95000  -- Mismatch!
Diagnosis:
-- Compare counts
WITH comparison AS (
    SELECT 
        COUNT(*) AS table_count,
        biscuit_get_active_count() AS index_count
    FROM products
)
SELECT 
    table_count,
    index_count,
    table_count - index_count AS discrepancy,
    ROUND(100.0 * index_count / table_count, 2) AS index_coverage_pct
FROM comparison;
-- Check trigger status
SELECT tgname, tgenabled FROM pg_trigger 
WHERE tgname = 'biscuit_auto_update'
AND tgrelid = 'products'::regclass;
-- Check for recent errors
SELECT biscuit_index_status();
Solutions:
A. Rebuild Index
-- Simple rebuild
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
-- Verify sync
SELECT 
    COUNT(*) AS table_count,
    biscuit_get_active_count() AS index_count
FROM products;
B. Ensure Exclusive Access During Rebuild
-- Lock table during rebuild to prevent concurrent modifications
BEGIN;
    LOCK TABLE products IN EXCLUSIVE MODE;
    SELECT biscuit_build_index('products', 'name', 'id');
COMMIT;
SELECT biscuit_enable_triggers();
C. Investigate Trigger Issues
-- If mismatch persists, check for trigger errors
-- Review PostgreSQL logs
-- Test manual insert
INSERT INTO products (id, name) VALUES (99999, 'test_sync');
-- Verify immediate sync
SELECT biscuit_match_count('%test_sync%');  -- Should return 1
-- If not synced, rebuild and monitor
SELECT biscuit_setup('products', 'name', 'id');
Diagnostic Queries
Comprehensive Health Check
DO $ 
DECLARE
    table_count BIGINT;
    index_count INTEGER;
    tombstone_count INTEGER;
    free_slots INTEGER;
    memory_mb NUMERIC;
    health_status TEXT;
BEGIN
    -- Get metrics
    EXECUTE 'SELECT COUNT(*) FROM products' INTO table_count;
    index_count := biscuit_get_active_count();
    tombstone_count := biscuit_get_tombstone_count();
    free_slots := biscuit_get_free_slots();
    memory_mb := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC;
    
    -- Display report
    RAISE NOTICE '========================================';
    RAISE NOTICE 'BISCUIT Health Check';
    RAISE NOTICE '========================================';
    RAISE NOTICE 'Table rows: %', table_count;
    RAISE NOTICE 'Index active: %', index_count;
    RAISE NOTICE 'Discrepancy: % rows', table_count - index_count;
    RAISE NOTICE 'Tombstones: %', tombstone_count;
    RAISE NOTICE 'Free slots: %', free_slots;
    RAISE NOTICE 'Memory: % MB', memory_mb;
    RAISE NOTICE '';
    
    -- Health assessment
    IF table_count != index_count THEN
        RAISE WARNING 'Index out of sync! Rebuild recommended.';
    END IF;
    
    IF tombstone_count > 10000 THEN
        RAISE WARNING 'CRITICAL: % tombstones pending cleanup', tombstone_count;
    ELSIF tombstone_count > 5000 THEN
        RAISE WARNING 'WARNING: % tombstones pending cleanup', tombstone_count;
    END IF;
    
    IF memory_mb > (index_count::NUMERIC / 10000) THEN
        RAISE WARNING 'High memory usage detected';
    END IF;
    
    -- Overall status
    IF table_count = index_count AND tombstone_count < 5000 THEN
        RAISE NOTICE 'Status: HEALTHY';
    ELSIF table_count = index_count AND tombstone_count < 10000 THEN
        RAISE NOTICE 'Status: GOOD (cleanup recommended)';
    ELSE
        RAISE NOTICE 'Status: NEEDS ATTENTION';
    END IF;
END $;
Performance Baseline Test
-- Establish performance baselines
\timing on
-- Test 1: All records
SELECT biscuit_match_count('%');
-- Test 2: Common pattern
SELECT biscuit_match_count('%prod%');
-- Test 3: Prefix pattern
SELECT biscuit_match_count('A%');
-- Test 4: Suffix pattern
SELECT biscuit_match_count('%@gmail.com');
-- Test 5: Complex pattern
SELECT biscuit_match_count('%a%b%');
\timing off
-- Save baselines for comparison
CREATE TABLE performance_baseline (
    test_name TEXT,
    pattern TEXT,
    execution_time_ms NUMERIC,
    recorded_at TIMESTAMPTZ DEFAULT NOW()
);
Performance Expectations
E-Commerce Product Search
-- Setup
SELECT biscuit_setup('products', 'name', 'id');
-- Search
SELECT * FROM biscuit_match('%laptop%');
SELECT * FROM biscuit_match('%macbook%') WHERE price < 2000;
-- Performance comparison
\timing on
-- Traditional LIKE: ~1850ms
SELECT COUNT(*) FROM products WHERE name LIKE '%laptop%';
-- BISCUIT: ~3ms (bitmap ops) + result retrieval time
SELECT biscuit_match_count('%laptop%');
\timing off
Customer Email Analysis
-- Setup
SELECT biscuit_setup('customers', 'email', 'customer_id');
-- Find all Gmail users
SELECT * FROM biscuit_match('%@gmail.com');
-- Domain distribution
WITH email_domains AS (
    SELECT 
        customer_id,
        email,
        SUBSTRING(email FROM '@(.*)') AS domain
    FROM biscuit_match('%')
)
SELECT 
    domain,
    COUNT(*) AS customer_count
FROM email_domains
GROUP BY domain
ORDER BY customer_count DESC;
Log Analysis
-- Setup
SELECT biscuit_setup('application_logs', 'message', 'log_id');
-- Find errors
SELECT * FROM biscuit_match('%error%')
WHERE level = 'ERROR'
AND timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC
LIMIT 100;
API Reference
Index Management
biscuit_setup(table, column, pk)- Complete one-step setupbiscuit_build_index(table, column, pk)- Build index structurebiscuit_enable_triggers()- Activate automatic maintenancebiscuit_disable_triggers()- Deactivate triggers
Query Functions
biscuit_match(pattern)- Returns full records (type-safe)biscuit_match_rows(pattern)- Returns (pk, value) pairsbiscuit_match_count(pattern)- Returns count onlybiscuit_match_keys(pattern)- Returns (pk_text, value_text)
Status Functions
biscuit_index_status()- Comprehensive status reportbiscuit_get_active_count()- Active record countbiscuit_get_free_slots()- Available slotsbiscuit_get_tombstone_count()- Pending tombstonesbiscuit_version()- Extension version
Maintenance
biscuit_cleanup()- Manual tombstone cleanup
Performance Expectations
What BISCUIT Is Good At
- Leading Wildcard Queries: 
'%pattern%'(traditionally requires full scan) - Multiple Pattern Searches: Better than repeated LIKE queries
 - Read-Heavy Workloads: Optimized for many queries, fewer updates
 - Moderate Data Sizes: Works well with millions of records
 
What BISCUIT Is Not
- Not a Full-Text Search: Use PostgreSQL’s 
tsvectorfor that - Not a Regex Engine: Only supports 
%and_wildcards - Not Write-Optimized: Updates require index maintenance
 - Not Memory-Free: Requires ~10-30% of data size in RAM
 
When to Use BISCUIT
✅ Good fit:
- Wildcard search on structured text fields (emails, usernames, SKUs)
 - Read-heavy applications
 - Need for fast 
LIKE '%pattern%'queries - Moderate result sets (< 10K matches per query)
 
❌ Poor fit:
- Full-text search with language processing
 - Extremely write-heavy workloads (> 10K updates/sec)
 - Very large result sets (> 50% of table)
 - Need for regex or complex text matching
 
License
PostgreSQL License (similar to MIT/BSD)
Copyright (c) 2024 BISCUIT Contributors
Version History
v1.0.3 (Current)
- Improved documentation accuracy
 - Bug fixes in slot resurrection
 - Enhanced monitoring
 
v1.0.0
- Initial release
 - Roaring Bitmap integration
 - Lazy deletion with tombstones
 - Incremental updates
 - O(1) hash table PK lookup
 
Contributors
Developed and maintained by Sivaprasad Murali
Issues: https://github.com/crystallinecore/biscuit/issues
Honest Summary
BISCUIT makes wildcard queries faster by trading memory and build time for query performance.
It’s not magic—it’s a bitmap index that:
- Pre-computes character positions
 - Uses fast bitmap operations instead of string scanning
 - Still requires O(n) time to return n results
 - Requires memory proportional to your data size
 
Use it when: You have moderate-sized datasets with lots of wildcard queries and don’t mind the memory overhead.
Don’t use it when: You need full-text search, have extreme write loads, or are memory-constrained.
When pg_trgm feels half-baked, grab a BISCUIT 🍪