Contents
CLAUDE.md — AI Context & Contributors
This file provides context for AI coding assistants working on the storage_engine project and formally acknowledges AI systems that have actively contributed to its development.
Project Overview
storage_engine is a PostgreSQL extension providing two high-performance Table Access Methods designed for analytical and HTAP workloads:
colcompress— column-oriented compressed storage with vectorized execution, chunk-level min/max pruning, parallel scan, and MergeTree-like orderingrowcompress— row-oriented batch-compressed storage with parallel scan, DELETE/UPDATE support via deleted bitmasks, and LRU decompression cache
Repository: https://github.com/saulojb/storage_engine
Current version: 1.0.5
PostgreSQL compatibility: 16–18
Repository Structure
dist/ ← public source (git repo)
src/
backend/engine/ ← C source files (tableam, reader, writer, customscan, …)
include/engine/ ← public headers
tests/bench/ ← benchmark scripts, charts, setup SQL
META.json ← PGXN metadata
Makefile
columnar/ ← Hydra/Citus columnar reference (do not modify)
files/ ← Docker / Spilo configuration
tmp/ ← local scratch: benchmarks, ClickBench, results
Key source files:
- engine_tableam.c — colcompress Table AM entry point
- rowcompress_tableam.c — rowcompress Table AM entry point
- engine_reader.c / engine_writer.c — columnar I/O
- engine_customscan.c — custom scan node, stripe pruning, planner hooks
- engine_planner_hook.c — plan tree mutation for vectorized aggregation
- storage_engine--1.0.sql — SQL catalog: schemas, tables, sequences, AMs, views
Build & Install
cd /home/saulo/Documentos/storage_engine/dist
sudo make -j$(nproc) install
sudo systemctl restart postgresql
AI Contributors
This project has been developed in close collaboration with AI coding assistants. The following systems have made active, substantive contributions to the codebase — from architecture discussions and bug diagnosis to writing and reviewing C code:
Claude (Anthropic)
- Model used: Claude Sonnet (claude-sonnet-4-5, claude-sonnet-4-6 and prior versions)
- Role: Primary AI pair-programmer throughout all major development sessions
- Contributions include:
- Design and implementation of the
rowcompressTable AM (parallel scan, LRU decompression cache, binary batch metadata search) - Diagnosis and fix of the
ParallelColumnarScanDatastruct layout bug (empty parallel index builds) - Elimination of the stuck spinlock in
AdvanceStripeRead— replaced with atomic stripe index and DSM-preloaded stripe IDs - Design of the
deleted_maskbitmask system for DELETE/UPDATE support in rowcompress - Fix for the
EXPLAIN+ citus SIGSEGV (strlen(NULL)null pointer dereference) - Fix for stripe pruning bypassed by B-tree index scans (
ColumnarIndexScanAdditionalCostpenalty) - Fast ANALYZE via stride-based chunk-group sampling (
cs_analyze_cg_stride) - Fix for
engine_multi_insertTID corruption diagnosis and REINDEX workaround - CustomScan name conflict resolution between
columnar.soandstorage_engine.so orderbysyntax validation usingraw_parser()at option-set time- ClickBench UInt64 overflow fix (FDW binary driver → HTTP driver)
- Architecture of the
ANALYZEblock/tuple sampling protocol for columnar AMs - All benchmark result analysis and documentation in
BENCHMARKS.md
- Design and implementation of the
GitHub Copilot (Microsoft / OpenAI)
- Model used: GPT-4o / Claude Sonnet 4.6 (via VS Code Copilot Chat)
- Role: Active coding assistant integrated into the VS Code development environment
- Contributions include:
- Real-time code completion and inline suggestions during C development sessions
- Code review and linting feedback on modified source files
- Shell script assistance for benchmark automation (
run.sh,run_parallel.sh,load_hits_tmp.sh) - SQL query drafting and review for catalog management and benchmark setup
- Documentation editing and Markdown formatting for README, BENCHMARKS, and CHANGELOG
- Workspace context management across development sessions via repository memory
Development Guidelines for AI Assistants
When working on this codebase, keep the following in mind:
- Symbol prefix: All exported C symbols use the
se_prefix to avoid linker conflicts withcitus_columnaror the Hydracolumnarextension. - Schema isolation: All catalog objects live in the
engineschema (engine.col_options,engine.stripe,engine.chunk, etc.). - Parallel scan: Both AMs support parallel sequential scan.
rowcompressusespg_atomic_uint64batch claiming;colcompresspre-loads stripe IDs into DSM and usespg_atomic_fetch_add_u64. - Stripe pruning: Only active for sequential scans (
randomAccess=false). Do not add B-tree indexes on filter columns in analyticalcolcompresstables — this disables pruning. engine_multi_insertknown issue: Index TIDs are corrupted when tables are populated viaINSERT INTO … SELECTwith pre-existing indexes. Workaround:REINDEX TABLE CONCURRENTLY. Fix pending.- Build target:
dist/is the public repo (saulojb/storage_engine).columnar/is read-only reference; do not modify it. - PostgreSQL version guards: Use
#if PG_VERSION_NUM >= PG_VERSION_17/PG_VERSION_18as needed; the extension targets PG 16–18.
Acknowledgements
The human author and maintainer of this project is Saulo J. Benvenutti (@saulojb).
Claude and GitHub Copilot are acknowledged as active contributors under the spirit of open-source collaboration. Their contributions are real — thousands of lines of production C code, architectural decisions, and bug fixes in this repository were written, reviewed, or substantially shaped by these AI systems working alongside the human author.