Plain-language companion: v0.21.0.md
v0.21.0 — Correctness, Safety & Test Hardening
Status: ✅ Released (2026-07-16). Driven by findings in PLAN_OVERALL_ASSESSMENT.md.
Release Theme
This release closes the last known data-correctness gap (EC-01 JOIN delta
phantom rows), reduces the unsafe code surface, expands unit-test coverage
in three large untested modules, adds a parser fuzz target, and adds a
crash-recovery test for the bgworker. A shadow/canary mode for
alter_stream_table makes migrations of critical stream tables safer, and a
refresh.rs module split into focused sub-modules reduces change risk.
A Performance Tuning Cookbook consolidates scattered advice into an operator
reference.
EC-01 Fix — JOIN Delta Phantom Rows
| Item |
Description |
Effort |
Ref |
|---|
| EC01-0 |
Q15 IMMEDIATE-mode stop-gap. Add Q15 to IMMEDIATE_SKIP_ALLOWLIST pending the EC-01 fix; superseded by EC01-3 which removes it once the fix lands. |
XS (1h) |
PLAN_OVERALL_ASSESSMENT.md §6.4 |
| EC01-1 |
Fix Part 1 row-id hash collision. When EC-01 splits Part 1 into 1a (ΔQ⋈R₁) and 1b (ΔQ⋈R₀), hash only the left-side PK on Part 1b so both halves produce the same __pgt_row_id and weight aggregation cancels them correctly. |
3–5d |
PLAN_EDGE_CASES.md §EC-01; src/dvm/operators/join.rs L234–245 |
| EC01-2 |
PH-D1 phantom cleanup. Verify PH-D1 DELETE+INSERT handles converged row ids from EC01-1; extend to cover prior-cycle phantom rows already in the stream table. |
1–2d |
src/refresh.rs L4991–5005 |
| EC01-3 |
TPC-H Q07 + Q15 regression gate. Remove Q07 from DIFFERENTIAL skip list; remove Q15 from IMMEDIATE skip list. Add test_tpch_q07_ec01b_combined_delete deterministic pass assertion. |
1d |
tests/e2e_tpch_tests.rs L92–104 |
| EC01-4 |
Multi-cycle phantom property test. 5,000-iteration proptest: delete right-side row while left changes; verify zero phantom accumulation after N cycles. |
1d |
src/dvm/operators/join.rs |
Safety & Code Quality
| Item |
Description |
Effort |
Ref |
|---|
| SAF-1 |
Convert production .unwrap() in sublinks.rs to ?. 28 sites in src/dvm/parser/sublinks.rs (e.g. from_list.head().unwrap(), get_ptr(i).unwrap()) converted to ok_or(PgTrickleError::UnsupportedPattern). |
2d |
PLAN_OVERALL_ASSESSMENT.md §2.2 |
| SAF-2 |
Unsafe reduction half-pass. Add list_nth_safe<T>() helper returning Option<PgBox<T>>; group repeated pg_sys::* FFI calls into safe façades in src/dvm/parser/types.rs. Target: ≥40% reduction in unsafe block count. |
1wk |
plans/safety/PLAN_REDUCED_UNSAFE.md |
| SAF-3 |
clippy::unwrap_used lint gate. Add #![deny(clippy::unwrap_used)] in lib.rs outside #[cfg(test)], with #[allow] on justified invariant sites in dag.rs. |
1d |
PLAN_OVERALL_ASSESSMENT.md §2.2 |
| OP-6 |
Non-deterministic function warning / rejection. Reject or warn at create_stream_table time if query uses now(), random(), volatile UDFs without explicit non_deterministic => true. Pre-v1.0 safety gate. |
S (2d) |
PLAN_OVERALL_ASSESSMENT.md §2.6 |
Test Coverage
| Item |
Description |
Effort |
Ref |
|---|
| TEST-1 |
Unit tests for src/api/helpers.rs (2.5k LOC). 25+ unit tests covering query validation, schema helpers, and CDC orchestration utilities. |
3d |
PLAN_OVERALL_ASSESSMENT.md §6.1 |
| TEST-2 |
Unit tests for src/api/diagnostics.rs (1.5k LOC). 15+ unit tests covering explain_st, health_summary, and cache_stats formatting logic. |
2d |
PLAN_OVERALL_ASSESSMENT.md §6.1 |
| TEST-3 |
Unit tests for src/dvm/parser/rewrites.rs (5.9k LOC). 30+ unit tests covering each of the 7 rewrite passes: view inlining, DISTINCT ON, GROUPING SETS, scalar SSQ in WHERE, correlated SSQ in SELECT, SubLinks in OR, multi-PARTITION BY windows. |
3d |
PLAN_OVERALL_ASSESSMENT.md §6.1 |
| TEST-4 |
Parser fuzz target (cargo-fuzz). Differential fuzz: feed random SQL to the pg_trickle parser and verify it never panics; compare accepted/rejected decisions against plain SELECT. Target: 1h of fuzzing with zero panics. |
1wk |
PLAN_OVERALL_ASSESSMENT.md §6.2 |
| TEST-5 |
Crash-recovery bgworker resilience test. pg_ctl stop -m immediate mid-refresh; verify: no unfinalised pgt_refresh_history entries, WAL decoder resumes from confirmed_lsn, change buffer is consistent. |
3d |
PLAN_OVERALL_ASSESSMENT.md §6.3 |
Architecture
| Item |
Description |
Effort |
Ref |
|---|
| ARCH-1 |
Split src/refresh.rs (8.4k LOC) into 4 sub-modules. refresh/orchestrator.rs (dispatch, status), refresh/codegen.rs (delta SQL generation), refresh/phd1.rs (PH-D1 phantom delete), refresh/merge.rs (MERGE strategy). Zero behaviour change — pure reorganisation. |
1wk |
PLAN_OVERALL_ASSESSMENT.md §2.4 |
| ARCH-2 |
Recursive CTE fallback observability. Log NOTICE: falling back to FULL refresh — defining query contains WITH RECURSIVE; expose refresh_reason = 'recursive_cte_fallback' tag in Prometheus metrics and pgt_refresh_history. |
1d |
PLAN_OVERALL_ASSESSMENT.md §2.5 |
Operational Features
| Item |
Description |
Effort |
Ref |
|---|
| OPS-1 |
Shadow/canary mode for alter_stream_table. Optional dry_run_shadow => true parameter: materialises new query into pgt_shadow_<name> on the same schedule; pgtrickle.canary_diff(name) diffs against the live table. pgtrickle.canary_promote(name) atomically swaps. |
1wk |
PLAN_OVERALL_ASSESSMENT.md §9.5 |
| OP-2 |
Prometheus HTTP endpoint in bgworker. Tiny HTTP server (port configurable via pg_trickle.metrics_port) emitting all monitoring metrics in OpenMetrics format. Removes “bring your own exporter” hurdle. |
S (1w) |
PLAN_OVERALL_ASSESSMENT.md §9.6 |
| OP-3 |
pgtrickle.pause_all() / resume_all() helpers. Idempotent SQL wrappers for suspending all stream tables during maintenance (e.g. pg_dump of source tables). |
XS (1d) |
PLAN_OVERALL_ASSESSMENT.md §9.1 |
| OP-4 |
pgtrickle.refresh_if_stale(name, max_age) convenience wrapper. Application-level staleness gating without custom procedural code. |
XS (1d) |
PLAN_OVERALL_ASSESSMENT.md §9.1 |
| OP-5 |
pgtrickle.stream_table_definition(name) helper. Single-row fetch of original query, refresh mode, schedule, and status for auditing / blue-green migrations. |
XS (1d) |
PLAN_OVERALL_ASSESSMENT.md §9.1 |
Documentation
| Item |
Description |
Effort |
Ref |
|---|
| DOC-1 |
Performance Tuning Cookbook. New docs/PERFORMANCE_COOKBOOK.md: symptom → likely cause → GUC to tune → measurement rows. Consolidates advice from FAQ, TROUBLESHOOTING, SCALING, and BENCHMARK. |
3d |
PLAN_OVERALL_ASSESSMENT.md §7.2 |
Implementation Phases
| Phase |
Description |
Duration |
|---|
| EC01 |
EC-01 fix: Q15 stop-gap + join.rs hash + refresh.rs PH-D1 + TPC-H validation |
Days 1–8 |
| SAF |
Safety pass: unwrap→?, unsafe reduction, lint gate, volatile-fn warning |
Days 9–15 |
| TEST |
Unit test campaign (3 files) + fuzz target + resilience test |
Days 16–24 |
| ARCH |
refresh.rs split + recursive CTE observability |
Days 25–29 |
| OPS+DOC |
Shadow/canary mode + Prometheus endpoint + API helpers + Performance Cookbook |
Days 30–40 |
v0.21.0 total: ~6–8 weeks (EC-01 fix + safety hardening + API ergonomics + Prometheus endpoint + test coverage + module refactor + shadow mode + docs)
Exit criteria:
- [x] EC01-0: Q15 added to IMMEDIATE_SKIP_ALLOWLIST as stop-gap
- [x] EC01-1/EC01-2: test_tpch_q07_ec01b_combined_delete passes deterministically
- [x] EC01-3: Q07 and Q15 removed from IMMEDIATE/DIFFERENTIAL skip allowlists
- [x] EC01-4: Multi-cycle phantom proptest passes 5,000 iterations
- [x] SAF-1: All 28 production .unwrap() sites in sublinks.rs converted to ?
- [x] SAF-2: unsafe block count reduced by ≥40%
- [x] SAF-3: clippy::unwrap_used lint gate passes with zero violations in non-test code
- [x] OP-6: create_stream_table warns or rejects queries using now(), random(), volatile UDFs without non_deterministic => true
- [x] TEST-½/3: ≥70 new unit tests across 3 previously-untested files
- [x] TEST-4: Fuzz target runs 1h with zero panics
- [x] TEST-5: Crash-recovery test passes deterministically
- [x] ARCH-1: refresh.rs split into 4 sub-modules; all existing tests pass unchanged
- [x] ARCH-2: refresh_reason = 'recursive_cte_fallback' visible in Prometheus/NOTIFY
- [x] OPS-1: canary_diff() / canary_promote() API functional with E2E tests
- [x] OP-2: Prometheus HTTP endpoint accessible at pg_trickle.metrics_port; all monitoring metrics present
- [x] OP-3: pgtrickle.pause_all() / resume_all() work idempotently; E2E test passes
- [x] OP-4: pgtrickle.refresh_if_stale(name, max_age) correctly gates refresh by age
- [x] OP-5: pgtrickle.stream_table_definition(name) returns accurate single-row result
- [x] DOC-1: docs/PERFORMANCE_COOKBOOK.md published
- [x] Extension upgrade path tested (0.20.0 → 0.21.0)
- [x] just check-version-sync passes