Plain-language companion: v0.21.0.md

v0.21.0 — Correctness, Safety & Test Hardening

Status: ✅ Released (2026-07-16). Driven by findings in PLAN_OVERALL_ASSESSMENT.md.

Release Theme This release closes the last known data-correctness gap (EC-01 JOIN delta phantom rows), reduces the unsafe code surface, expands unit-test coverage in three large untested modules, adds a parser fuzz target, and adds a crash-recovery test for the bgworker. A shadow/canary mode for alter_stream_table makes migrations of critical stream tables safer, and a refresh.rs module split into focused sub-modules reduces change risk. A Performance Tuning Cookbook consolidates scattered advice into an operator reference.

EC-01 Fix — JOIN Delta Phantom Rows

Item Description Effort Ref
EC01-0 Q15 IMMEDIATE-mode stop-gap. Add Q15 to IMMEDIATE_SKIP_ALLOWLIST pending the EC-01 fix; superseded by EC01-3 which removes it once the fix lands. XS (1h) PLAN_OVERALL_ASSESSMENT.md §6.4
EC01-1 Fix Part 1 row-id hash collision. When EC-01 splits Part 1 into 1a (ΔQ⋈R₁) and 1b (ΔQ⋈R₀), hash only the left-side PK on Part 1b so both halves produce the same __pgt_row_id and weight aggregation cancels them correctly. 3–5d PLAN_EDGE_CASES.md §EC-01; src/dvm/operators/join.rs L234–245
EC01-2 PH-D1 phantom cleanup. Verify PH-D1 DELETE+INSERT handles converged row ids from EC01-1; extend to cover prior-cycle phantom rows already in the stream table. 1–2d src/refresh.rs L4991–5005
EC01-3 TPC-H Q07 + Q15 regression gate. Remove Q07 from DIFFERENTIAL skip list; remove Q15 from IMMEDIATE skip list. Add test_tpch_q07_ec01b_combined_delete deterministic pass assertion. 1d tests/e2e_tpch_tests.rs L92–104
EC01-4 Multi-cycle phantom property test. 5,000-iteration proptest: delete right-side row while left changes; verify zero phantom accumulation after N cycles. 1d src/dvm/operators/join.rs

Safety & Code Quality

Item Description Effort Ref
SAF-1 Convert production .unwrap() in sublinks.rs to ?. 28 sites in src/dvm/parser/sublinks.rs (e.g. from_list.head().unwrap(), get_ptr(i).unwrap()) converted to ok_or(PgTrickleError::UnsupportedPattern). 2d PLAN_OVERALL_ASSESSMENT.md §2.2
SAF-2 Unsafe reduction half-pass. Add list_nth_safe<T>() helper returning Option<PgBox<T>>; group repeated pg_sys::* FFI calls into safe façades in src/dvm/parser/types.rs. Target: ≥40% reduction in unsafe block count. 1wk plans/safety/PLAN_REDUCED_UNSAFE.md
SAF-3 clippy::unwrap_used lint gate. Add #![deny(clippy::unwrap_used)] in lib.rs outside #[cfg(test)], with #[allow] on justified invariant sites in dag.rs. 1d PLAN_OVERALL_ASSESSMENT.md §2.2
OP-6 Non-deterministic function warning / rejection. Reject or warn at create_stream_table time if query uses now(), random(), volatile UDFs without explicit non_deterministic => true. Pre-v1.0 safety gate. S (2d) PLAN_OVERALL_ASSESSMENT.md §2.6

Test Coverage

Item Description Effort Ref
TEST-1 Unit tests for src/api/helpers.rs (2.5k LOC). 25+ unit tests covering query validation, schema helpers, and CDC orchestration utilities. 3d PLAN_OVERALL_ASSESSMENT.md §6.1
TEST-2 Unit tests for src/api/diagnostics.rs (1.5k LOC). 15+ unit tests covering explain_st, health_summary, and cache_stats formatting logic. 2d PLAN_OVERALL_ASSESSMENT.md §6.1
TEST-3 Unit tests for src/dvm/parser/rewrites.rs (5.9k LOC). 30+ unit tests covering each of the 7 rewrite passes: view inlining, DISTINCT ON, GROUPING SETS, scalar SSQ in WHERE, correlated SSQ in SELECT, SubLinks in OR, multi-PARTITION BY windows. 3d PLAN_OVERALL_ASSESSMENT.md §6.1
TEST-4 Parser fuzz target (cargo-fuzz). Differential fuzz: feed random SQL to the pg_trickle parser and verify it never panics; compare accepted/rejected decisions against plain SELECT. Target: 1h of fuzzing with zero panics. 1wk PLAN_OVERALL_ASSESSMENT.md §6.2
TEST-5 Crash-recovery bgworker resilience test. pg_ctl stop -m immediate mid-refresh; verify: no unfinalised pgt_refresh_history entries, WAL decoder resumes from confirmed_lsn, change buffer is consistent. 3d PLAN_OVERALL_ASSESSMENT.md §6.3

Architecture

Item Description Effort Ref
ARCH-1 Split src/refresh.rs (8.4k LOC) into 4 sub-modules. refresh/orchestrator.rs (dispatch, status), refresh/codegen.rs (delta SQL generation), refresh/phd1.rs (PH-D1 phantom delete), refresh/merge.rs (MERGE strategy). Zero behaviour change — pure reorganisation. 1wk PLAN_OVERALL_ASSESSMENT.md §2.4
ARCH-2 Recursive CTE fallback observability. Log NOTICE: falling back to FULL refresh — defining query contains WITH RECURSIVE; expose refresh_reason = 'recursive_cte_fallback' tag in Prometheus metrics and pgt_refresh_history. 1d PLAN_OVERALL_ASSESSMENT.md §2.5

Operational Features

Item Description Effort Ref
OPS-1 Shadow/canary mode for alter_stream_table. Optional dry_run_shadow => true parameter: materialises new query into pgt_shadow_<name> on the same schedule; pgtrickle.canary_diff(name) diffs against the live table. pgtrickle.canary_promote(name) atomically swaps. 1wk PLAN_OVERALL_ASSESSMENT.md §9.5
OP-2 Prometheus HTTP endpoint in bgworker. Tiny HTTP server (port configurable via pg_trickle.metrics_port) emitting all monitoring metrics in OpenMetrics format. Removes “bring your own exporter” hurdle. S (1w) PLAN_OVERALL_ASSESSMENT.md §9.6
OP-3 pgtrickle.pause_all() / resume_all() helpers. Idempotent SQL wrappers for suspending all stream tables during maintenance (e.g. pg_dump of source tables). XS (1d) PLAN_OVERALL_ASSESSMENT.md §9.1
OP-4 pgtrickle.refresh_if_stale(name, max_age) convenience wrapper. Application-level staleness gating without custom procedural code. XS (1d) PLAN_OVERALL_ASSESSMENT.md §9.1
OP-5 pgtrickle.stream_table_definition(name) helper. Single-row fetch of original query, refresh mode, schedule, and status for auditing / blue-green migrations. XS (1d) PLAN_OVERALL_ASSESSMENT.md §9.1

Documentation

Item Description Effort Ref
DOC-1 Performance Tuning Cookbook. New docs/PERFORMANCE_COOKBOOK.md: symptom → likely cause → GUC to tune → measurement rows. Consolidates advice from FAQ, TROUBLESHOOTING, SCALING, and BENCHMARK. 3d PLAN_OVERALL_ASSESSMENT.md §7.2

Implementation Phases

Phase Description Duration
EC01 EC-01 fix: Q15 stop-gap + join.rs hash + refresh.rs PH-D1 + TPC-H validation Days 1–8
SAF Safety pass: unwrap?, unsafe reduction, lint gate, volatile-fn warning Days 9–15
TEST Unit test campaign (3 files) + fuzz target + resilience test Days 16–24
ARCH refresh.rs split + recursive CTE observability Days 25–29
OPS+DOC Shadow/canary mode + Prometheus endpoint + API helpers + Performance Cookbook Days 30–40

v0.21.0 total: ~6–8 weeks (EC-01 fix + safety hardening + API ergonomics + Prometheus endpoint + test coverage + module refactor + shadow mode + docs)

Exit criteria: - [x] EC01-0: Q15 added to IMMEDIATE_SKIP_ALLOWLIST as stop-gap - [x] EC01-1/EC01-2: test_tpch_q07_ec01b_combined_delete passes deterministically - [x] EC01-3: Q07 and Q15 removed from IMMEDIATE/DIFFERENTIAL skip allowlists - [x] EC01-4: Multi-cycle phantom proptest passes 5,000 iterations - [x] SAF-1: All 28 production .unwrap() sites in sublinks.rs converted to ? - [x] SAF-2: unsafe block count reduced by ≥40% - [x] SAF-3: clippy::unwrap_used lint gate passes with zero violations in non-test code - [x] OP-6: create_stream_table warns or rejects queries using now(), random(), volatile UDFs without non_deterministic => true - [x] TEST-½/3: ≥70 new unit tests across 3 previously-untested files - [x] TEST-4: Fuzz target runs 1h with zero panics - [x] TEST-5: Crash-recovery test passes deterministically - [x] ARCH-1: refresh.rs split into 4 sub-modules; all existing tests pass unchanged - [x] ARCH-2: refresh_reason = 'recursive_cte_fallback' visible in Prometheus/NOTIFY - [x] OPS-1: canary_diff() / canary_promote() API functional with E2E tests - [x] OP-2: Prometheus HTTP endpoint accessible at pg_trickle.metrics_port; all monitoring metrics present - [x] OP-3: pgtrickle.pause_all() / resume_all() work idempotently; E2E test passes - [x] OP-4: pgtrickle.refresh_if_stale(name, max_age) correctly gates refresh by age - [x] OP-5: pgtrickle.stream_table_definition(name) returns accurate single-row result - [x] DOC-1: docs/PERFORMANCE_COOKBOOK.md published - [x] Extension upgrade path tested (0.20.0 → 0.21.0) - [x] just check-version-sync passes