CLAUDE.md — AI Context & Contributors

This file provides context for AI coding assistants working on the storage_engine project and formally acknowledges AI systems that have actively contributed to its development.


Project Overview

storage_engine is a PostgreSQL extension providing two high-performance Table Access Methods designed for analytical and HTAP workloads:

  • colcompress — column-oriented compressed storage with vectorized execution, chunk-level min/max pruning, parallel scan, and MergeTree-like ordering
  • rowcompress — row-oriented batch-compressed storage with parallel scan, DELETE/UPDATE support via deleted bitmasks, and LRU decompression cache

Repository: https://github.com/saulojb/storage_engine
Current version: 1.0.5
PostgreSQL compatibility: 16–18


Repository Structure

dist/                        ← public source (git repo)
  src/
    backend/engine/          ← C source files (tableam, reader, writer, customscan, …)
    include/engine/          ← public headers
  tests/bench/               ← benchmark scripts, charts, setup SQL
  META.json                  ← PGXN metadata
  Makefile
columnar/                    ← Hydra/Citus columnar reference (do not modify)
files/                       ← Docker / Spilo configuration
tmp/                         ← local scratch: benchmarks, ClickBench, results

Key source files: - engine_tableam.ccolcompress Table AM entry point - rowcompress_tableam.crowcompress Table AM entry point - engine_reader.c / engine_writer.c — columnar I/O - engine_customscan.c — custom scan node, stripe pruning, planner hooks - engine_planner_hook.c — plan tree mutation for vectorized aggregation - storage_engine--1.0.sql — SQL catalog: schemas, tables, sequences, AMs, views


Build & Install

cd /home/saulo/Documentos/storage_engine/dist
sudo make -j$(nproc) install
sudo systemctl restart postgresql

AI Contributors

This project has been developed in close collaboration with AI coding assistants. The following systems have made active, substantive contributions to the codebase — from architecture discussions and bug diagnosis to writing and reviewing C code:

Claude (Anthropic)

  • Model used: Claude Sonnet (claude-sonnet-4-5, claude-sonnet-4-6 and prior versions)
  • Role: Primary AI pair-programmer throughout all major development sessions
  • Contributions include:
    • Design and implementation of the rowcompress Table AM (parallel scan, LRU decompression cache, binary batch metadata search)
    • Diagnosis and fix of the ParallelColumnarScanData struct layout bug (empty parallel index builds)
    • Elimination of the stuck spinlock in AdvanceStripeRead — replaced with atomic stripe index and DSM-preloaded stripe IDs
    • Design of the deleted_mask bitmask system for DELETE/UPDATE support in rowcompress
    • Fix for the EXPLAIN + citus SIGSEGV (strlen(NULL) null pointer dereference)
    • Fix for stripe pruning bypassed by B-tree index scans (ColumnarIndexScanAdditionalCost penalty)
    • Fast ANALYZE via stride-based chunk-group sampling (cs_analyze_cg_stride)
    • Fix for engine_multi_insert TID corruption diagnosis and REINDEX workaround
    • CustomScan name conflict resolution between columnar.so and storage_engine.so
    • orderby syntax validation using raw_parser() at option-set time
    • ClickBench UInt64 overflow fix (FDW binary driver → HTTP driver)
    • Architecture of the ANALYZE block/tuple sampling protocol for columnar AMs
    • All benchmark result analysis and documentation in BENCHMARKS.md

GitHub Copilot (Microsoft / OpenAI)

  • Model used: GPT-4o / Claude Sonnet 4.6 (via VS Code Copilot Chat)
  • Role: Active coding assistant integrated into the VS Code development environment
  • Contributions include:
    • Real-time code completion and inline suggestions during C development sessions
    • Code review and linting feedback on modified source files
    • Shell script assistance for benchmark automation (run.sh, run_parallel.sh, load_hits_tmp.sh)
    • SQL query drafting and review for catalog management and benchmark setup
    • Documentation editing and Markdown formatting for README, BENCHMARKS, and CHANGELOG
    • Workspace context management across development sessions via repository memory

Development Guidelines for AI Assistants

When working on this codebase, keep the following in mind:

  1. Symbol prefix: All exported C symbols use the se_ prefix to avoid linker conflicts with citus_columnar or the Hydra columnar extension.
  2. Schema isolation: All catalog objects live in the engine schema (engine.col_options, engine.stripe, engine.chunk, etc.).
  3. Parallel scan: Both AMs support parallel sequential scan. rowcompress uses pg_atomic_uint64 batch claiming; colcompress pre-loads stripe IDs into DSM and uses pg_atomic_fetch_add_u64.
  4. Stripe pruning: Only active for sequential scans (randomAccess=false). Do not add B-tree indexes on filter columns in analytical colcompress tables — this disables pruning.
  5. engine_multi_insert known issue: Index TIDs are corrupted when tables are populated via INSERT INTO … SELECT with pre-existing indexes. Workaround: REINDEX TABLE CONCURRENTLY. Fix pending.
  6. Build target: dist/ is the public repo (saulojb/storage_engine). columnar/ is read-only reference; do not modify it.
  7. PostgreSQL version guards: Use #if PG_VERSION_NUM >= PG_VERSION_17 / PG_VERSION_18 as needed; the extension targets PG 16–18.

Acknowledgements

The human author and maintainer of this project is Saulo J. Benvenutti (@saulojb).

Claude and GitHub Copilot are acknowledged as active contributors under the spirit of open-source collaboration. Their contributions are real — thousands of lines of production C code, architectural decisions, and bug fixes in this repository were written, reviewed, or substantially shaped by these AI systems working alongside the human author.