Developer Guide

This section is for anyone who wants to understand or contribute to ProvSQL's internals. It complements the Doxygen API reference with explanations of design decisions, data flow, and extension points that are not apparent from the code alone.

The primary reference for the system design is the ICDE 2026 paper sen2026provsql.

Chapters

postgresql-primer
The PostgreSQL extension concepts the rest of this guide assumes you know: shared_preload_libraries, planner hooks, the Query node tree, background workers, palloc, ereport, SQL-callable C functions, OIDs, and GUCs. Read this first if you have not written a PostgreSQL extension before.
architecture
High-level overview: how the extension is loaded, how its C and |cpp| components fit together, and how data flows from an SQL query to a provenance evaluation result. Covers the constants_t OID cache and the gate_type enum.
query-rewriting
Detailed walkthrough of provsql.c -- the planner hook, the process_query function, and every phase of query rewriting (CTE inlining, set operations, aggregation, expression building, splicing, HAVING, where-provenance, INSERT ... SELECT).
memory
How provenance circuits are persisted across transactions via memory-mapped files, the background worker architecture, IPC via anonymous pipes, and the shared-memory coordination layer.
where-provenance
The where-provenance subsystem: how WhereCircuit differs from the semiring world, how project and eq gates are built and interpreted, and how the column map produced by build_column_map ties query rewriting to per-cell locator output.
data-modification
Tracking the provenance of INSERT / UPDATE / DELETE statements: the provsql.update_provenance GUC, the trigger machinery, the gate_update gate type, the update_provenance housekeeping table, and the temporal features (undo, timetravel, timeslice, history) built on top.
aggregation
The semimodule data model for aggregate provenance, the role of the agg_token type, the agg / semimod / value / delta gates, the aggregation-specific phases of process_query, and a step-by-step guide for adding a new compiled aggregate.
semiring-evaluation
The Semiring interface, walkthroughs of the Boolean and Counting semirings, a step-by-step guide for adding a new compiled semiring, and notes on symbolic-representation semirings such as sr_formula.
probability-evaluation
The probability-evaluation dispatcher in probability_evaluate.cpp, the d-DNNF data structure, the Tseytin encoding, knowledge compilation through external compilers, weighted model counting, the tree-decomposition path, and a step-by-step guide for plugging in a new method.
coding-conventions
Naming, error reporting, memory management, the C/|cpp| boundary, and the small set of project-specific conventions that reviewers will ask new contributors to follow.
testing
How ProvSQL's pg_regress-based test suite works, how to write new tests, the external-tool skip pattern, and how to read test failures.
debugging
Debug builds, the provsql.verbose_level GUC, circuit inspection SQL functions, circuit visualization, and GDB tips.
build-system
The two-Makefile structure, PGXS integration, PostgreSQL version guards, generated SQL files, make website / make deploy, release.sh, and CI workflows.