CLAUDE.md — AI Context & Contributors

CLAUDE.md — AI Context & Contributors

This file provides context for AI coding assistants working on the storage_engine project and formally acknowledges AI systems that have actively contributed to its development.

Project Overview

storage_engine is a PostgreSQL extension providing two high-performance Table Access Methods designed for analytical and HTAP workloads:

colcompress — column-oriented compressed storage with vectorized execution, chunk-level min/max pruning, parallel scan, and MergeTree-like ordering
rowcompress — row-oriented batch-compressed storage with parallel scan, DELETE/UPDATE support via deleted bitmasks, and LRU decompression cache

Repository: https://github.com/saulojb/storage_engine
Current version: 1.0.5
PostgreSQL compatibility: 15–19

Repository Structure

dist/                        ← public source (git repo)
  src/
    backend/engine/          ← C source files (tableam, reader, writer, customscan, …)
    include/engine/          ← public headers
  tests/bench/               ← benchmark scripts, charts, setup SQL
  META.json                  ← PGXN metadata
  Makefile
columnar/                    ← Hydra/Citus columnar reference (do not modify)
files/                       ← Docker / Spilo configuration
tmp/                         ← local scratch: benchmarks, ClickBench, results

Key source files: - engine_tableam.c — colcompress Table AM entry point - rowcompress_tableam.c — rowcompress Table AM entry point - engine_reader.c / engine_writer.c — columnar I/O - engine_customscan.c — custom scan node, stripe pruning, planner hooks - engine_planner_hook.c — plan tree mutation for vectorized aggregation - storage_engine--1.0.sql — SQL catalog: schemas, tables, sequences, AMs, views

Build & Install

cd /home/saulo/Documentos/storage_engine/dist
sudo make -j$(nproc) install
sudo systemctl restart postgresql

AI Contributors

This project has been developed in close collaboration with AI coding assistants. The following systems have made active, substantive contributions to the codebase — from architecture discussions and bug diagnosis to writing and reviewing C code:

Claude (Anthropic)

Model used: Claude Sonnet (claude-sonnet-4-5, claude-sonnet-4-6 and prior versions)
Role: Primary AI pair-programmer throughout all major development sessions
Contributions include:
- Design and implementation of the rowcompress Table AM (parallel scan, LRU decompression cache, binary batch metadata search)
- Diagnosis and fix of the ParallelColumnarScanData struct layout bug (empty parallel index builds)
- Elimination of the stuck spinlock in AdvanceStripeRead — replaced with atomic stripe index and DSM-preloaded stripe IDs
- Design of the deleted_mask bitmask system for DELETE/UPDATE support in rowcompress
- Fix for the EXPLAIN + citus SIGSEGV (strlen(NULL) null pointer dereference)
- Fix for stripe pruning bypassed by B-tree index scans (ColumnarIndexScanAdditionalCost penalty)
- Fast ANALYZE via stride-based chunk-group sampling (cs_analyze_cg_stride)
- Fix for engine_multi_insert TID corruption diagnosis and REINDEX workaround
- CustomScan name conflict resolution between columnar.so and storage_engine.so
- orderby syntax validation using raw_parser() at option-set time
- ClickBench UInt64 overflow fix (FDW binary driver → HTTP driver)
- Architecture of the ANALYZE block/tuple sampling protocol for columnar AMs
- All benchmark result analysis and documentation in BENCHMARKS.md

GitHub Copilot (Microsoft / OpenAI)

Model used: GPT-4o / Claude Sonnet 4.6 (via VS Code Copilot Chat)
Role: Active coding assistant integrated into the VS Code development environment
Contributions include:
- Real-time code completion and inline suggestions during C development sessions
- Code review and linting feedback on modified source files
- Shell script assistance for benchmark automation (run.sh, run_parallel.sh, load_hits_tmp.sh)
- SQL query drafting and review for catalog management and benchmark setup
- Documentation editing and Markdown formatting for README, BENCHMARKS, and CHANGELOG
- Workspace context management across development sessions via repository memory

Development Guidelines for AI Assistants

When working on this codebase, keep the following in mind:

Symbol prefix: All exported C symbols use the se_ prefix to avoid linker conflicts with citus_columnar or the Hydra columnar extension.
Schema isolation: All catalog objects live in the engine schema (engine.col_options, engine.stripe, engine.chunk, etc.).
Parallel scan: Both AMs support parallel sequential scan. rowcompress uses pg_atomic_uint64 batch claiming; colcompress pre-loads stripe IDs into DSM and uses pg_atomic_fetch_add_u64.
Stripe pruning: Only active for sequential scans (randomAccess=false). Do not add B-tree indexes on filter columns in analytical colcompress tables — this disables pruning.
engine_multi_insert known issue: Index TIDs are corrupted when tables are populated via INSERT INTO … SELECT with pre-existing indexes. Workaround: REINDEX TABLE CONCURRENTLY. Fix pending.
Build target: dist/ is the public repo (saulojb/storage_engine). columnar/ is read-only reference; do not modify it.
PostgreSQL version guards: Use #if PG_VERSION_NUM >= PG_VERSION_17 / PG_VERSION_18 as needed; the extension targets PG 16–18.

Acknowledgements

The human author and maintainer of this project is Saulo J. Benvenutti (@saulojb).

Claude and GitHub Copilot are acknowledged as active contributors under the spirit of open-source collaboration. Their contributions are real — thousands of lines of production C code, architectural decisions, and bug fixes in this repository were written, reviewed, or substantially shaped by these AI systems working alongside the human author.

PGXN

PostgreSQL Extension Network

Contents