Contents
CHANGELOG
1.1.1
- fix: remove Citus autoconf build artifacts — the root
Makefilewas the Citus 11.1devel toplevel Makefile and required./configure(a Citus-specific autoconf script) to be run before any build could proceed. This causedconfigure: error: C compiler cannot create executablesand other Citus-specific probe failures for users with non-standard toolchains (ccache without a backing compiler, aarch64/ARM Linux, NixOS, etc.). The rootMakefileis now a simple delegator tosrc/backend/engine. A portable, pre-generatedMakefile.globalis now tracked in the repository and usespg_configfromPATH— no./configurestep is needed. The six Citus autoconf artifacts (configure,configure.in,autogen.sh,aclocal.m4,Makefile.global.in,src/include/citus_config.h.in) are removed from the repository. Build is now simply: ```bash sudo make -j$(nproc) installor with an explicit pg_config:
PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config sudo make install ```
1.1.0
- feat:
RowcompressScancustom scan node with batch-level min/max pruning —rowcompresstables now support apruning_columnparameter (engine.alter_rowcompress_table_set(tbl, pruning_column := 'col')). When set,RowcompressScanrecords the serialised min/max value of the pruning column per batch duringengine.rowcompress_repack()or bulk inserts, storing them inengine.row_batch.batch_min_value/batch_max_value. At scan time, batches whose range does not intersect the query predicate are skipped entirely — no decompression, no I/O. The new GUCstorage_engine.enable_custom_scan(defaulton) controls whetherRowcompressScanis injected by the planner hook. - feat:
engine.rowcompress_repack(tbl)— utility function that rewrites all batches of arowcompresstable in sorted order by thepruning_column, maximising pruning efficiency for range queries (e.g. date, timestamp, bigint sequences). - schema:
engine.row_options.pruning_attnum— new nullableint2column; stores the 1-based attribute number of the pruning column. - schema:
engine.row_batch.batch_min_value/batch_max_value— new nullablebyteacolumns; store serialised type-agnostic min/max statistics per batch. - upgrade:
ALTER EXTENSION storage_engine UPDATE TO '1.1'applies the schema changes viastorage_engine--1.0--1.1.sql.
1.0.10
- fix: pg_search (ParadeDB) BM25 transparent compatibility —
IsNotIndexPathinengine_customscan.cnow preservesCustomPathnodes whoseCustomNameequals"ParadeDB Base Scan". Previously,RemovePathsByPredicate(rel, IsNotIndexPath)discarded pg_search’s planner path, causing the@@@operator to fall through as aFilterinsideColcompressScan, which then failed with “Unsupported query shape”. BM25 full-text search on colcompress tables now works transparently — no need forSET storage_engine.enable_custom_scan = false.pdb.score(),pdb.snippet(),===, and multi-fieldAND @@@all work correctly.ColcompressScancontinues to handle all other query shapes (projection pushdown, stripe pruning, parallel scan) without change.
1.0.9
- docs: pg_search 0.23 (ParadeDB) compatibility — colcompress tables are fully
compatible with pg_search BM25 full-text search. The BM25 index (
CREATE INDEX USING bm25) works transparently viaindex_fetch_tuple;@@@,===,pdb.score(), andpdb.snippet()all function correctly. To avoidColcompressScanintercepting the planner before pg_search’sParadeDB Base Scanpath is selected, useSET storage_engine.enable_custom_scan = falsefor queries that use@@@. A future release will auto-detect the@@@operator inColumnarSetRelPathlistHookand skip the hook transparently. - docs: native regex alternative to BM25 for analytics —
~*(POSIX case-insensitive regex) on colcompress tables usesColcompressScanwith full parallelism and stripe-level projection pushdown, achieving the same recall as BM25 at 3× lower latency (60 ms vs ~200 ms for 150k rows, 8 parallel workers). Prefer~*over@@@for counter/aggregation patterns; reserve BM25 for ranked retrieval and fuzzy matching. - bench: updated serial and parallel benchmark results; added baseline CSV for regression tracking.
1.0.8
- fix:
UPDATEduplicate-key error on colcompress tables with unique indexes —engine_index_fetch_tuplenow consults the in-memoryRowMaskWriteStateMapbitmask before falling back toColumnarReadRowByRowNumberfor flushed stripes. Previously,engine_tuple_update()marked the old row deleted (viaUpdateRowMask) and immediately inserted the new version; the unique-constraint recheck viaindex_fetch_tupleread a stale pre-deletion snapshot from the B-tree entry’s old TID and returned “tuple still alive”, causing a spurious duplicate-key error on everyUPDATE. - fix: deleted rows visible within same command —
engine_tuple_satisfies_snapshotnow also consultsRowMaskWriteStateMap, so rows deleted within the current transaction are correctly reported as invisible during the same command, preventing false positives in constraint checks. - fix: OOM crash in
engine_tuple_updatewith large VARLENA columns —ColumnarWriteRowInternaladds a memory-based flush guard: if thestripeWriteContextexceeds 256 MB (SE_MAX_STRIPE_MEM_BYTES), the current stripe is flushed before buffering the next row. This prevents OOM crashes when stripe row-count limits are generous but rows carry large VARLENA columns (XML, JSON, PDF).
1.0.7
- fix: GIN
BitmapHeapScanbypassesColcompressScanwithrandom_page_cost=1.1— On NVMe-tuned servers (random_page_cost=1.1), the planner preferred a GINBitmap Heap ScanoverCustom Scan (ColcompressScan)for analytical queries with JSONB@>or array@>predicates whenindex_scan=false. This caused +195–237% regression in serial mode vs baseline (Q6 JSONB: 163ms→479ms, Q8 array: 123ms→414ms). Fixed by adding adisable_cost(1e10) penalty to everyBitmapHeapPathinCostColumnarPathswhenindex_scan=false, symmetric with the existing penalty forIndexPath. Tables withindex_scan=trueare unaffected. Fix confirmed: serial Q6 175ms (-63%), Q8 141ms (-66%). - fix:
index_scan=falsegate missing inengine_reader.cchunk loader — The single-chunk targeted loading optimisation (ColumnarReadRowByRowNumber) was activating unconditionally, including on analytics tables whereindex_scan=false. AddedindexScanEnabledfield toColumnarReadState, populated fromReadColumnarOptionsinColumnarBeginRead, and gated the single-chunk optimisation onreadState->indexScanEnabled. - fix:
BitmapHeapPathpenalty also applied topartial_pathlist— parallel bitmap heap paths were not being penalised, allowing GIN scans via parallel workers to bypassColcompressScaneven withindex_scan=false. - fix: infinite loop in index scan point lookup —
ColumnarReadRowByRowNumbercould loop forever when the requested row number fell beyond the last stripe, producing a hang with no error output. - fix: index scan cost at chunk granularity —
ColumnarIndexScanAdditionalCostnow computesperChunkCostinstead ofperStripeCost, eliminating the ~15× cost inflation that caused the planner to always rejectIndexScanoverColcompressScanfor selective point lookups on wide columnar tables. - fix: use projected column count in
ColumnarIndexScanAdditionalCost— replacedRelationIdGetNumberOfAttributeswithlist_length(rel->reltarget->exprs), so wide tables with large blob columns (XML/JSON) no longer inflate index scan cost beyond the full-scan cost, restoring planner choice forindex_scan=truetables. - fix: remove stray
randomAccessPenaltyfromColumnarIndexScanAdditionalCost— the per-row penalty (estimatedRows * cpu_tuple_cost * 100) was dead code whenindex_scan=false(path already blocked bydisable_cost) but was still evaluated whenindex_scan=true, causing the planner to always chooseSeqScanoverIndexScanregardless of selectivity. Removed entirely.
1.0.6
- fix:
index_scan=falsebypassed byParallel Index Scan—CostColumnarPathsonly iteratedrel->pathlist, leavingrel->partial_pathlist(parallel paths) untouched. When a B-tree index existed on a colcompress table, the planner choseParallel Index Scaneven withindex_scan=false, bypassing stripe pruning entirely. Fixed by iteratingrel->partial_pathlistinCostColumnarPathsand applyingdisable_cost(1e10) to everyIndexPathfound there. - fix:
disable_costforindex_scan=falseserial paths — replaced the proportional penalty (estimatedRows * cpu_tuple_cost * 100.0) with PostgreSQL’s canonicaldisable_costconstant (1e10), matching the behaviour ofSET enable_indexscan = off. The old penalty was smaller than the seq-scan cost for low-selectivity queries (~4% of rows), so the planner still preferredIndexScanoverColcompressScan. - bench: updated serial and parallel benchmark results and charts (1M rows, PostgreSQL 18, 4 access methods).
1.0.5
- fix: EXPLAIN + citus SIGSEGV —
IsCreateTableAs(NULL)calledstrlen(NULL)when citus passedquery_string=NULLinternally; added NULL guard. AddedIsExplainQueryguard to skipPlanTreeMutatorfor EXPLAIN statements. FixedT_CustomScanelse branch to recurse intocustom_plansinstead ofelog(ERROR). - fix: stripe pruning bypassed by btree indexes — when a btree index existed on a
colcompress table, the planner chose
IndexScanwithrandomAccess=true, which disabled stripe pruning entirely. Fixed by strengtheningColumnarIndexScanAdditionalCostwith a per-row random-access penalty (estimatedRows * cpu_tuple_cost * 100.0), steering the planner back to seq scan. - perf:
ColumnarIndexScanAdditionalCostper-row penalty — discourages index scans on large colcompress tables where full-stripe pruning is more efficient. - docs: benchmark kit — added
tests/bench/with setup SQL, serial/parallel run scripts, chart generators, and result PNGs; addedBENCHMARKS.mdwith full analysis. - docs: README — citus load order note, btree/stripe-pruning Known Limitation, Benchmarks section, corrected install path.
1.0.4
- chore: bump version to 1.0.4 (PGXN meta).
- docs: benchmark results — heap vs colcompress vs rowcompress vs citus_columnar.
1.0.3
- perf: stripe-level min/max pruning for colcompress scans — before reading
any stripe, the scan aggregates the per-column min/max statistics from
engine.chunkacross all chunks of the stripe and tests the resulting stripe-wide ranges against the query’s WHERE predicates usingpredicate_refuted_by. Any stripe whose range is provably disjoint from the predicate is skipped entirely — no decompression, no I/O. The pruned count is shown inEXPLAIN:
Engine Stripes Removed by Pruning: N
Pruning applies to both the serial scan path and the parallel DSM path
(parallel workers only receive stripe IDs that survive the filter).
Effectiveness scales directly with data sortedness; combine with
engine.colcompress_merge() and the orderby table option to maximise it.
1.0.2
- fix: index corruption during
COPYinto colcompress tables —engine_multi_insertwas callingExecInsertIndexTuples()internally, while COPY’sCopyMultiInsertBufferFlushalso calls it aftertable_multi_insertreturns. The double insertion corrupted every B-tree index on tables loaded viaCOPY. Fixed by removing all executor infrastructure from the per-tuple loop; index insertion is the caller’s responsibility, matchingheap_multi_insertsemantics. - fix: index corruption when
orderbyand indexes coexist — when sort-on-write is active,ColumnarWriteRow()buffers rows and returnsCOLUMNAR_FIRST_ROW_NUMBER(= 1) as a placeholder for every row. The executor then indexed all rows with TID(0,1), making every index lookup return the first row. Fixed inengine_init_write_state(): sort-on-write is disabled when the target relation hasrelhasindex = true. Tables with indexes already have fast key access; sort ordering is redundant and was silently lethal. - perf: fast
ANALYZEvia chunk-group stride sampling — samples at mostN / stridechunk groups (stride = max(1, nchunks / 300)) instead of reading the entire table, makingANALYZEon large colcompress tables milliseconds instead of minutes.
Migration note (1.0.1 → 1.0.2): any colcompress table that has indexes and was written with
COPYorcolcompress_mergeusing a prior version must be rebuilt:REINDEX TABLE CONCURRENTLY <table>;
1.0.1
- fix:
multi_insertnow setstts_tidbefore opening indexes, and explicitly callsExecInsertIndexTuples()— previously B-tree entries received garbage TIDs duringINSERT INTO ... SELECT, causing index scans to return wrong rows. Tables populated before this fix requireREINDEX TABLE CONCURRENTLY. - fix:
orderbysyntax is now validated atALTER TABLE SET (orderby=...)time instead of at merge time, giving an immediate error on bad input. - fix: CustomScan node names renamed to avoid symbol collision with
columnar.sowhen both extensions are loaded simultaneously. - fix: corrected SQL function names for
se_alter_engine_table_set/se_alter_engine_table_reset(C symbols were mismatched). - fix: added
safeclibsymlink undervendor/somemcpy_sresolves correctly at link time. - add:
META.jsonfor PGXN publication.
1.0.0
Initial release of storage_engine — a PostgreSQL table access method extension derived from Hydra Columnar and extended with two independent access methods:
- colcompress — column-oriented storage with vectorized execution, parallel
DSM scan, chunk pruning, and a MergeTree-style per-table sort key (
orderby). - rowcompress — row-compressed batch storage with parallel work-stealing scan and full DELETE/UPDATE support via a row-level mask.
Additional features added beyond the upstream:
- per-table
index_scanoption (GUCstorage_engine.enable_index_scan) - full DELETE/UPDATE support for colcompress via row mask
- parallel columnar scan wired through DSM
- GUCs under the
storage_engine.*namespace - support for PostgreSQL 16, 17, and 18