Plain-language companion: v0.17.0.md

v0.17.0 — Query Intelligence & Stability

Status: Released (2026-04-08).

Goal: Make the refresh engine smarter, prove correctness through automated fuzzing, harden for scale, and prepare for adoption. Cost-based strategy selection replaces the fixed DIFF/FULL threshold, columnar change tracking skips irrelevant columns in wide-table UPDATEs, SQLancer integration provides automated semantic proving, incremental DAG rebuild supports 1000+ stream table deployments, and unsafe block reduction continues the safety hardening toward 1.0. On the adoption side: api.rs modularization improves code maintainability, a pg_ivm migration guide targets the largest potential adopter audience, a failure mode runbook equips production teams, and a Docker Compose playground provides a 60-second tryout experience.

Cost-Based Refresh Strategy Selection (B-4)

In plain terms: The current adaptive FULL/DIFFERENTIAL threshold is a fixed ratio (differential_max_change_ratio default 0.5). A join-heavy query may be better off with FULL at 5% change rate, while a scan-only query benefits from DIFFERENTIAL up to 80%. This replaces the fixed threshold with a cost model trained on each stream table’s own refresh history — selecting the cheapest strategy per cycle automatically.

Item Description Effort Ref
B-4 Cost-based refresh strategy selection. Collect per-ST statistics (delta_row_count, merge_duration_ms, full_refresh_duration_ms, query_complexity_class) from pgt_refresh_history. Fit a simple linear cost model. Before each refresh, compare estimated_diff_cost(Δ) vs estimated_full_cost × safety_margin and select the cheaper path. Cold-start heuristic (< 10 refreshes) falls back to existing fixed threshold. Gate behind pg_trickle.refresh_strategy = 'auto'|'differential'|'full' GUC. 2–3 wk plans/performance/PLAN_NEW_STUFF.md §B-4

B-4 subtotal: ~2–3 weeks

Columnar Change Tracking (A-2-COL)

In plain terms: When a source table UPDATE changes only 1 of 50 columns, the current CDC captures the entire row (old + new) and the delta query processes all columns. If the changed column is not referenced by the stream table’s defining query, the entire refresh is wasted work. Columnar change tracking adds a per-column bitmask to CDC events so the delta query can skip irrelevant rows at scan time — a 50–90% reduction in delta volume for wide-table OLTP workloads.

Item Description Effort Ref
A-2-COL-1 CDC trigger bitmask. Compute changed_columns bitmask (old.col IS DISTINCT FROM new.col) in the CDC trigger; store as int8 or bit(n) alongside the change row. 1–2 wk plans/performance/PLAN_NEW_STUFF.md §A-2
A-2-COL-2 Delta-scan column filtering. At delta-query build time, consult the bitmask: skip rows where no referenced column changed; use lightweight UPDATE-only path when only projected columns changed (no join keys, no filter predicates, no aggregate keys). 1–2 wk plans/performance/PLAN_NEW_STUFF.md §A-2
A-2-COL-3 Aggregate correction optimization. For aggregates where only the aggregated value column changed (not GROUP BY key), emit a single correction row instead of delete-old + insert-new. 3–5d plans/performance/PLAN_NEW_STUFF.md §A-2

A-2-COL subtotal: ~3–4 weeks

Transactional IVM Phase 4 Remaining (A2)

In plain terms: IMMEDIATE mode (same-transaction refresh) shipped in v0.2.0 using SQL-level statement triggers. Phase 4 completes the transition to lower-overhead C-level triggers and ENR-based transition tables — sharing the transition tuplestore directly between the trigger and the refresh engine instead of copying through a temp table. Also adds prepared statement reuse to eliminate repeated parse/plan overhead for the delta query.

Item Description Effort Ref
A2-ENR ENR-based transition tables. 🚫 Deferred post-1.0 — requires raw pg_sys ENR tuplestore FFI not surfaced by pgrx; carries memory-corruption and pg_upgrade compatibility risk. Revisit after 1.0 stabilisation. 12–18h PLAN_TRANSACTIONAL_IVM.md §Phase 4
A2-CTR C-level triggers. 🚫 Deferred post-1.0 — requires raw CreateTrigger() FFI not surfaced by pgrx; carries memory-corruption and pg_upgrade compatibility risk. Revisit after 1.0 stabilisation. 12–18h PLAN_TRANSACTIONAL_IVM.md §Phase 4
A2-PS Prepared statement reuse.Already shippedpg_trickle.use_prepared_statements GUC (default true) implemented and wired in refresh.rs; parse/plan overhead eliminated on steady-state workloads. 8–12h PLAN_TRANSACTIONAL_IVM.md §Phase 4

A2 subtotal: 0h remaining (A2-PS shipped; A2-ENR + A2-CTR deferred post-1.0)

ROWS FROM() Support (A8)

In plain terms: ROWS FROM() with multiple set-returning functions is a rarely-used SQL feature, but supporting it closes a coverage gap in the parser and DVM pipeline.

Item Description Effort Ref
A8 ROWS FROM() with multiple SRF functions. Parser + DVM support for ROWS FROM(generate_series(...), unnest(...)) in defining queries. Very low demand. ~1–2d PLAN_TRANSACTIONAL_IVM_PART_2.md Task 2.3

A8 subtotal: ~1–2 days

SQLancer Fuzzing Integration (SQLANCER)

In plain terms: pg_trickle’s tests were written by the pg_trickle team, which means they share the same assumptions as the code. SQLancer is an automated database testing tool that generates random SQL queries and checks whether the results are correct — it has found hundreds of bugs in PostgreSQL, SQLite, CockroachDB, and TiDB. Integrating SQLancer gives pg_trickle a crash-test oracle (does the parser panic on fuzzed input?), an equivalence oracle (does DIFFERENTIAL mode produce the same answer as FULL?), and stateful DML fuzzing (do random INSERT/UPDATE/DELETE sequences corrupt stream table data?). This is the single highest-value testing investment for finding unknown correctness bugs.

Item Description Effort Ref
SQLANCER-1 Fuzzing environment.Done — Docker-based harness (just sqlancer), Rust LCG query generator, SQLANCER_CASES/SQLANCER_SEED controls, weekly-sqlancer CI job. 2–3d PLAN_SQLANCER.md §1
SQLANCER-2 Crash-test oracle.Donetest_sqlancer_crash_oracle / run_crash_oracle() verifies zero backend crashes over 200–2000 fuzzed queries. 3–5d PLAN_SQLANCER.md §2
SQLANCER-3 Equivalence oracle.Donetest_sqlancer_diff_vs_full_oracle / run_diff_vs_full_oracle() creates DIFFERENTIAL + FULL stream tables, applies 4 DML mutations, and asserts count parity. Integrated into test_sqlancer_ci_combined. 3–5d PLAN_SQLANCER.md §3
SQLANCER-4 Stateful DML fuzzing.Donetest_sqlancer_stateful_dml / run_stateful_dml_fuzzing() runs SQLANCER_MUTATIONS (default 100, nightly 10 000) random INSERT/UPDATE/DELETE mutations with checkpoints every 50. CI: weekly-sqlancer-stateful job (SQLANCER_MUTATIONS=10000). 3–5d PLAN_SQLANCER.md §4

SQLANCER subtotal: 0 remaining (all four items shipped in v0.17.0)

Incremental DAG Rebuild (C-2)

In plain terms: When any DDL change occurs (e.g. ALTER STREAM TABLE, DROP STREAM TABLE), the entire dependency graph is rebuilt from scratch by querying pgt_dependencies. For 1000+ stream tables this becomes expensive — O(V+E) SPI queries. Incremental DAG maintenance records which specific stream table was affected and only re-sorts the affected subgraph, reducing the scheduler latency spike from ~50ms to ~1ms at scale.

Item Description Effort Ref
C-2-1 Delta-based rebuild. Record affected pgt_id in a bounded ring buffer in shared memory alongside DAG_REBUILD_SIGNAL. On overflow, fall back to full rebuild. 1 wk plans/performance/PLAN_NEW_STUFF.md §C-2
C-2-2 Incremental topological sort. Add/remove only affected edges and vertices; re-run topological sort on the affected subgraph only. Cache the sorted schedule in shared memory. 1–2 wk plans/performance/PLAN_NEW_STUFF.md §C-2

C-2 subtotal: ~2–3 weeks

Unsafe Block Reduction — Phase 6 (UNSAFE-R1/R2)

In plain terms: pg_trickle achieved a 51% reduction in unsafe blocks (from ~1,300 to 641) in earlier releases. The remaining blocks are concentrated in well-documented field-accessor macros and standalone is_a type checks. Converting these to safe wrappers removes another 150–250 unsafe blocks with minimal risk — a meaningful safety improvement before 1.0.

Item Description Effort Ref
UNSAFE-R1 Safe field-accessor macros. Replace unsafe { (*node).field } patterns with safe accessor functions. Estimated reduction: ~100–150 unsafe blocks. 2–4h PLAN_REDUCED_UNSAFE.md §R1
UNSAFE-R2 Safe is_a checks. Convert standalone unsafe { is_a(node, T_Foo) } calls to safe wrapper functions. Estimated reduction: ~50–99 unsafe blocks. 2–4h PLAN_REDUCED_UNSAFE.md §R2

UNSAFE-R1/R2 subtotal: ~4–8 hours

api.rs Modularization (API-MOD)

In plain terms: api.rs is 9,413 lines — the largest file in the codebase. It contains stream table CRUD, ALTER QUERY, CDC management, bulk operations, diagnostics, and monitoring functions all in one file. The same treatment that parser.rs received in v0.15.0 (split from 21K lines into 5 sub-modules) is needed here. Zero behavior change — purely structural.

Item Description Effort Ref
API-MOD Split src/api.rs into sub-modules. Proposed split: api/create.rs (create/drop/alter), api/refresh.rs (refresh entry points), api/cdc.rs (CDC management), api/diagnostics.rs (explain_st, health_check), api/bulk.rs (bulk_create), api/mod.rs (re-exports). Zero behavior change. 1–2 wk

API-MOD subtotal: ~1–2 weeks

pg_ivm Migration Guide (MIG-IVM)

In plain terms: pg_ivm is the incumbent IVM extension with 1,400+ GitHub stars and 4 years of production use. Many potential pg_trickle adopters are currently using pg_ivm. A step-by-step migration guide — mapping pg_ivm concepts to pg_trickle equivalents, with concrete SQL examples — removes the biggest adoption friction for this audience.

Item Description Effort Ref
MIG-IVM pg_ivm → pg_trickle migration guide. Map: create_immv()create_stream_table(); refresh_immv()refresh_stream_table(); IMMEDIATE mode equivalence; aggregate coverage differences (5 vs 60+); GUC mapping; worked example migrating a real pg_ivm deployment. Publish as docs/tutorials/MIGRATING_FROM_PG_IVM.md. 2–3d docs/research/PG_IVM_COMPARISON.md

MIG-IVM subtotal: ~2–3 days

Failure Mode Runbook (RUNBOOK)

In plain terms: Production teams need to know what happens when things go wrong — and what to do about it. This documents every failure mode pg_trickle can encounter (scheduler crash, WAL slot lag, OOM during refresh, disk full, replication slot conflict, stuck watermarks, circular convergence failure) with symptoms, diagnosis steps, and resolution procedures. Essential for on-call engineers.

Item Description Effort Ref
RUNBOOK Failure mode runbook. Document: scheduler crash recovery, WAL decoder failures, OOM during refresh, disk-full behavior, replication slot conflicts, stuck watermarks, circular convergence timeout, CDC trigger failures, SUSPENDED state recovery, lock contention diagnosis. Include health_check() output interpretation and explain_st() troubleshooting. Publish as docs/TROUBLESHOOTING.md. 3–5d docs/PRE_DEPLOYMENT.md

RUNBOOK subtotal: ~3–5 days

Docker Quickstart Playground (PLAYGROUND)

In plain terms: The fastest way to evaluate any database extension is to run it locally in 60 seconds. A docker-compose.yml with PostgreSQL + pg_trickle pre-installed, sample data (e.g. the org-chart from GETTING_STARTED.md), and a Jupyter notebook or pgAdmin web UI gives potential users a zero-friction tryout experience. This is the single most impactful thing for driving initial adoption.

Item Description Effort Ref
PLAYGROUND Docker Compose quickstart. docker-compose.yml with: PG 18 + pg_trickle, seed SQL script (org-chart example from GETTING_STARTED.md + TPC-H SF=0.01), pgAdmin web UI (optional). Single docker compose up command. README with guided walkthrough. 2–3d docs/GETTING_STARTED.md

PLAYGROUND subtotal: ~2–3 days

Documentation Polish (DOC-POLISH)

In plain terms: The existing documentation is comprehensive and technically excellent, but it’s optimized for users already familiar with IVM and PostgreSQL internals. These items restructure the docs for a better “first hour” experience — simpler getting-started examples, a refresh mode decision guide, a condensed new-user FAQ, and a setup verification checklist. The goal is to reduce cognitive overload for new users without losing the depth that experienced users need.

Item Description Effort Ref
DOC-HELLO Simplified “Hello Stream Table” in GETTING_STARTED. Add a Chapter 0 with a single-table, single-aggregate stream table (e.g. SELECT department, count(*) FROM employees GROUP BY department). Create it, insert a row, verify the refresh. Build confidence before the multi-table org-chart example. 2–4h docs/GETTING_STARTED.md
DOC-DECIDE Refresh mode decision guide. Flowchart: “Need transactional consistency? → IMMEDIATE. Volatile functions? → FULL. Otherwise → AUTO (DIFFERENTIAL with FULL fallback).” Include when-to-use guidance for each mode with concrete examples. Publish as a section in GETTING_STARTED or as a standalone tutorial. 2–4h docs/tutorials/tuning-refresh-mode.md
DOC-FAQ-NEW New User FAQ (top 15 questions). Extract the 15 most common new-user questions from the 3,000-line FAQ into a prominent “New User FAQ” section at the top. Keyword-rich headings for searchability. Link to deep FAQ for details. 2–3h docs/FAQ.md
DOC-VERIFY Post-install verification checklist. SQL script that verifies: extension loaded, shared_preload_libraries configured, GUCs set, CDC triggers installable, first stream table creates and refreshes successfully. Runnable as psql -f verify_install.sql. 2–4h docs/GETTING_STARTED.md
DOC-STUBS Fill or remove research stubs. PG_IVM_COMPARISON.md (60 bytes) and CUSTOM_SQL_SYNTAX.md (57 bytes) are empty stubs. Either flesh them out (PG_IVM_COMPARISON can draw from the existing comparison data) or remove from SUMMARY.md. 2–4h docs/research/

DOC-POLISH subtotal: ~2–3 days

v0.17.0 total: ~2–3 weeks (cost-based strategy) + ~3–4 weeks (columnar tracking) + ~32–48 hours (TIVM Phase 4) + ~1–2 days (ROWS FROM) + ~2–3 weeks (SQLancer) + ~2–3 weeks (incremental DAG) + ~4–8 hours (unsafe reduction) + ~1–2 weeks (api.rs modularization) + ~2–3 days (pg_ivm migration) + ~3–5 days (failure runbook) + ~2–3 days (Docker playground) + ~2–3 days (doc polish)

Exit criteria: - [x] B-4: Cost-based strategy selector trained on per-ST history; cold-start fallback to fixed threshold; QueryComplexityClass cost model (scan/filter/aggregate/join/join_agg); refresh_strategy + cost_model_safety_margin GUCs; pre-refresh predictive comparison; 10 unit tests - [x] A-2-COL: CDC trigger emits changed_cols VARBIT bitmask (COL-1); delta-scan filters irrelevant rows via changed_cols & mask (COL-2); aggregate value-only correction ‘V’ path halves row volume (COL-3) - [ ] A2-ENR: 🚫 Deferred post-1.0 — requires raw pg_sys ENR tuplestore FFI (memory-corruption risk); revisit after 1.0 stabilisation - [ ] A2-CTR: 🚫 Deferred post-1.0 — requires raw CreateTrigger() C FFI (memory-corruption risk); revisit after 1.0 stabilisation - [x] A2-PS: ✅ Already shipped — pg_trickle.use_prepared_statements GUC (default true) wired in refresh.rs; parse/plan overhead eliminated on steady-state workloads - [x] A8: ROWS FROM() with multiple SRFs accepted in defining queries; E2E tests cover INSERT/UPDATE/DELETE propagation - [x] SQLANCER: ✅ SQLANCER-½ crash + equivalence oracles shipped in v0.12.0; SQLANCER-3 diff-vs-full oracle and SQLANCER-4 stateful DML soak (10K mutations) added in v0.17.0; weekly-sqlancer-stateful CI job wired - [x] C-2: Incremental DAG rebuild reduces DDL-triggered latency spike to < 5ms at 100+ STs; ring buffer overflow falls back to full rebuild; no correctness regressions - [x] UNSAFE-R1/R2: Unsafe block count reduced by 249 (690→441 in parser); is_node_type! and pg_deref! macros; all 1,700 unit tests pass - [x] API-MOD: api.rs split into 3 sub-modules (mod.rs 5,624 + diagnostics.rs 1,377 + helpers.rs 2,461); zero behavior change; all 1,700 unit tests pass - [x] MIG-IVM: docs/tutorials/MIGRATING_FROM_PG_IVM.md published with step-by-step migration, API mapping, behavioral differences, SQL upgrade examples, and verification checklist - [x] RUNBOOK: docs/TROUBLESHOOTING.md covers 13 failure scenarios (scheduler, SUSPENDED, CDC triggers, WAL slots, INITIALIZING, buffer growth, lock contention, OOM, disk full, circular convergence, schema changes, worker pool, fuse) with symptoms, diagnosis, and resolution - [x] PLAYGROUND: playground/ with docker-compose.yml, seed.sql (3 base tables, 5 stream tables), and README walkthrough - [x] DOC-HELLO: Chapter 1 “Hello World” in GETTING_STARTED already provides the single-table aggregate example (products/category_summary) - [x] DOC-DECIDE: Refresh mode decision guide already published as tutorials/tuning-refresh-mode.md with recommend_refresh_mode() and signal breakdown - [x] DOC-FAQ-NEW: New User FAQ section with 15 keyword-rich entries added at top of FAQ.md - [x] DOC-VERIFY: scripts/verify_install.sql checks shared_preload_libraries, extension, scheduler, GUCs, and runs end-to-end stream table cycle - [x] DOC-STUBS: Research stubs already use {{#include}} directives pointing to substantial content (923 + 1232 lines) - [x] Extension upgrade path tested (0.16.0 → 0.17.0)