Plain-language companion: v0.4.0.md

v0.4.0 — Parallel Refresh & Performance Hardening

Status: Released (2026-03-12).

Goal: Deliver true parallel refresh, cut write-side CDC overhead with statement-level triggers, close a cross-source snapshot consistency gap, and ship quick ergonomic and infrastructure improvements. Together these close the main performance and operational gaps before the security and partitioning work begins.

Parallel Refresh

In plain terms: Right now the scheduler refreshes stream tables one at a time. This feature lets multiple stream tables refresh simultaneously — like running several errands at once instead of in a queue. When you have dozens of stream tables, this can cut total refresh latency dramatically.

Detailed implementation is tracked in PLAN_PARALLELISM.md. The older REPORT_PARALLELIZATION.md remains the options-analysis precursor.

Item Description Effort Ref
P1 Phase 0–1: instrumentation, dry_run, and execution-unit DAG (atomic groups + IMMEDIATE closures) 12–20h PLAN_PARALLELISM.md §10
P2 Phase 2–4: job table, worker budget, dynamic refresh workers, and ready-queue dispatch 16–28h PLAN_PARALLELISM.md §10
P3 Phase 5–7: composite units, observability, rollout gating, and CI validation 12–24h PLAN_PARALLELISM.md §10

Progress: - [x] P1 — Phase 0 + Phase 1 (done): GUCs (parallel_refresh_mode, max_dynamic_refresh_workers), ExecutionUnit/ExecutionUnitDag types in dag.rs, IMMEDIATE-closure collapsing, dry-run logging in scheduler, 10 new unit tests (1211 total). - [x] P2 — Phase 2–4 (done): Job table (pgt_scheduler_jobs), catalog CRUD, shared-memory token pool (Phase 2). Dynamic worker entry point, spawn helper, reconciliation (Phase 3). Coordinator dispatch loop with ready-queue scheduling, per-db/cluster-wide budget enforcement, transaction-split spawning, dynamic poll interval, 8 new unit tests (Phase 4). 1233 unit tests total. - [x] P3a — Phase 5 (done): Composite unit execution — execute_worker_atomic_group() with C-level sub-transaction rollback, execute_worker_immediate_closure() with root-only refresh (IMMEDIATE triggers propagate downstream). Replaces Phase 3 serial placeholder. - [x] P3b — Phase 6 (done): Observability — worker_pool_status(), parallel_job_status() SQL functions; health_check() extended with worker_pool and job_queue checks; docs updated. - [x] P3c — Phase 7 (done): Rollout — GUC documentation in CONFIGURATION.md, worker-budget guidance in ARCHITECTURE.md, CI E2E coverage with PGT_PARALLEL_MODE=on, feature stays gated behind parallel_refresh_mode = 'off' default.

Parallel refresh subtotal: ~40–72 hours

Statement-Level CDC Triggers

In plain terms: Previously, when you updated 1,000 rows in a source table, the database fired a “row changed” notification 1,000 times — once per row. Now it fires once per statement, handing off all 1,000 changed rows in a single batch. For bulk operations like data imports or batch updates this is 50–80% cheaper; for single-row changes you won’t notice a difference.

Replace per-row AFTER triggers with statement-level triggers using NEW TABLE AS __pgt_new / OLD TABLE AS __pgt_old. Expected write-side trigger overhead reduction of 50–80% for bulk DML; neutral for single-row.

Item Description Effort Ref
B1 Replace per-row triggers with statement-level triggers; INSERT/UPDATE/DELETE via set-based buffer fill 8h ✅ Done — build_stmt_trigger_fn_sql in cdc.rs; REFERENCING NEW TABLE AS __pgt_new OLD TABLE AS __pgt_old FOR EACH STATEMENT created by create_change_trigger
B2 pg_trickle.cdc_trigger_mode = 'statement'|'row' GUC + migration to replace row-level triggers on ALTER EXTENSION UPDATE 4h ✅ Done — CdcTriggerMode enum in config.rs; rebuild_cdc_triggers() in api.rs; 0.3.0→0.4.0 upgrade script migrates existing triggers
B3 Write-side benchmark matrix (narrow/medium/wide tables × bulk/single DML) 2h ✅ Done — bench_stmt_vs_row_cdc_matrix + bench_stmt_vs_row_cdc_quick in e2e_bench_tests.rs; runs via cargo test -- --ignored bench_stmt_vs_row_cdc_matrix

Statement-level CDC subtotal: ✅ All done (~14h)

Cross-Source Snapshot Consistency (Phase 1)

In plain terms: Imagine a stream table that joins orders and customers. If a single transaction updates both tables, the old scheduler could read the new orders data but the old customers data — a half-applied, internally inconsistent snapshot. This fix takes a “freeze frame” of the change log at the start of each scheduler tick and only processes changes up to that point, so all sources are always read from the same moment in time. Zero configuration required.

At start of each scheduler tick, snapshot pg_current_wal_lsn() as a tick_watermark and cap all CDC consumption to that LSN. Zero user configuration — prevents interleaved reads from two sources that were updated in the same transaction from producing an inconsistent stream table.

Item Description Effort Ref
CSS1 LSN tick watermark: snapshot pg_current_wal_lsn() per tick; cap frontier advance; log in pgt_refresh_history; pg_trickle.tick_watermark_enabled GUC (default on) 3–4h ✅ Done

Cross-source consistency subtotal: ✅ All done

Ergonomic Hardening

In plain terms: Added helpful warning messages for common mistakes: “your WAL level isn’t configured for logical replication”, “this source table has no primary key — duplicate rows may appear”, “this change will trigger a full re-scan of all source data”. Think of these as friendly guardrails that explain why something might not work as expected.

Item Description Effort Ref
ERG-B Warn at _PG_init when cdc_mode='auto' but wal_level != 'logical' — prevents silent trigger-only operation 30min ✅ Done
ERG-C Warn at create_stream_table when source has no primary key — surfaces keyless duplicate-row risk 1h ✅ Done (pre-existing in warn_source_table_properties)
ERG-F Emit WARNING when alter_stream_table triggers an implicit full refresh 1h ✅ Done

Ergonomic hardening subtotal: ✅ All done

Code Coverage

In plain terms: Every pull request now automatically reports what percentage of the code is exercised by tests, and which specific lines are never touched. It’s like a map that highlights the unlit corners — helpful for spotting blind spots before they become bugs.

Item Description Effort Ref
COV Codecov integration: move token to with:, add codecov.yml with patch targets for src/dvm/, add README badge, verify first upload 1–2h ✅ Done — reports live at app.codecov.io/github/grove/pg-trickle

v0.4.0 total: ~60–94 hours

Exit criteria: - [x] max_concurrent_refreshes drives real parallel refresh via coordinator + dynamic refresh workers - [x] Statement-level CDC triggers implemented (B1/B2/B3); benchmark harness in bench_stmt_vs_row_cdc_matrix - [x] LSN tick watermark active by default; no interleaved-source inconsistency in E2E tests - [x] Codecov badge on README; coverage report uploading - [x] Extension upgrade path tested (0.3.0 → 0.4.0)