Plain-language companion: v0.24.0.md

v0.24.0 — Join Correctness & Durability Hardening

Status: Released (2026-04-20). Sourced from PLAN_OVERALL_ASSESSMENT_2.md §3, §4, §6.

Release Theme This release closes the remaining critical correctness bugs and data-durability gaps identified in the v0.23.0 deep assessment. The EC-01 join phantom-row bug — deferred since v0.21.0 — is finally resolved, restoring full DIFFERENTIAL correctness for multi-table LEFT/RIGHT/FULL JOINs under mixed DML. Change-buffer durability becomes configurable, and a two-phase frontier commit eliminates the crash-replay window. Supporting work includes TOAST-aware CDC hashing, partitioned-source publication health checks, history retention, and a unit-test campaign for the v0.21–v0.23 surface area.

EC-01 Join Correctness Fix

Item Description Effort Ref
EC01-1 Row-id hash convergence for Part 1b. Modify src/dvm/operators/join.rs so the Part 1b arm (Δ⋈R₀) hashes only the left-side PK, ensuring both Part 1a and 1b emit the same __pgt_row_id for a given logical row. 3d PLAN_OVERALL_ASSESSMENT_2.md §3 #1
EC01-2 PH-D1 cross-cycle phantom cleanup. Extend the PH-D1 delete path in src/refresh/phd1.rs to reconcile orphaned row ids from prior cycles, not just the current delta. 2d PLAN_OVERALL_ASSESSMENT_2.md §3 #1
EC01-3 Remove Q15 from IMMEDIATE_SKIP_ALLOWLIST. Re-enable TPC-H Q15 in IMMEDIATE mode correctness tests after EC01-½ land. 0.5d PLAN_OVERALL_ASSESSMENT_2.md §3 #1
EC01-4 Proptest harness for join cross-cycle convergence. 5,000-iteration property test asserting INSERT/UPDATE/DELETE sequences on multi-table JOINs converge to the same result as a full refresh. 2d PLAN_OVERALL_ASSESSMENT_2.md §3 #1

Durability & Frontier Atomicity

Item Description Effort Ref
DUR-1 Two-phase frontier commit. Write a tentative frontier to a side column before TRUNCATE; finalise after MERGE commits; reconcile on startup. Unifies the manual-refresh and scheduler code paths. 5d PLAN_OVERALL_ASSESSMENT_2.md §3 #2
DUR-2 pg_trickle.change_buffer_durability GUC. New GUC with values unlogged (default, current behaviour), logged (WAL-logged change buffers), sync (logged + synchronous commit). 3d PLAN_OVERALL_ASSESSMENT_2.md §3 #2
DUR-3 Crash-recovery E2E test for frontier consistency. Kill bgworker between TRUNCATE and frontier-store; assert no phantom replays or lost rows on restart. 2d PLAN_OVERALL_ASSESSMENT_2.md §3 #2

CDC Hardening

Item Description Effort Ref
CDC-1 Eliminate two unwrap() sites in src/cdc.rs. Convert build_changed_cols_bitmask_expr().unwrap() to ? with a new PgTrickleError::ChangedColsBitmaskFailed variant. 0.5d PLAN_OVERALL_ASSESSMENT_2.md §3 #3
CDC-2 Partitioned-source publication rebuild. On scheduler tick, compare pg_publication_tables.pubviaroot against source relkind = 'p'; rebuild publication with publish_via_partition_root = true if mismatched. Emit refresh_reason = 'publication_rebuild'. 3d PLAN_OVERALL_ASSESSMENT_2.md §3 #4
CDC-3 TOAST-aware CDC hashing. Include pg_column_size() for TOASTable columns (attstorage IN ('e', 'x')) in the row-id hash to detect in-place TOAST rewrites. 3d PLAN_OVERALL_ASSESSMENT_2.md §3 #5
CDC-4 TOAST workload E2E tests. Add jsonb-update and bytea-update scenarios to tests/e2e_cdc_edge_case_tests.rs. 1d PLAN_OVERALL_ASSESSMENT_2.md §6

Operational Improvements

Item Description Effort Ref
OPS-1 pg_trickle.refresh_history_retention_days GUC. Default 7 days. Bgworker prunes stale rows in 1k-row batches during idle ticks. 2d PLAN_OVERALL_ASSESSMENT_2.md §4
OPS-2 Frozen-stream-table detector. New self-monitoring view df_frozen_stream_tables that flags any ST whose last_refresh_at < now() - 5 × refresh_interval with recent CDC activity. Alert via pgtrickle_alert NOTIFY. 2d PLAN_OVERALL_ASSESSMENT_2.md §4
OPS-3 Missing internal catalog indexes. Add composite indexes on pgt_stream_tables(status, scc_id), pgt_refresh_history(pgt_id, action, data_timestamp), pgt_change_tracking(source_relid), and a partial index on changes_<oid>(__pgt_action). 1d PLAN_OVERALL_ASSESSMENT_2.md §5

Test Coverage (TEST-6/7/8)

Item Description Effort Ref
TEST-6 Unit tests for src/api/publication.rs. Cover fit_linear_regression, predict_diff_duration_ms, should_preempt_to_full, assign_tier_for_sla, maybe_adjust_tier_for_sla, boundary cases (0, negative, NaN). 25+ tests. 3d PLAN_OVERALL_ASSESSMENT_2.md §6
TEST-7 Unit tests for src/api/diagnostics.rs. Cover explain_query_rewrite, diagnose_errors, validate_query, 5 gather_* helpers. 20+ tests. 2d PLAN_OVERALL_ASSESSMENT_2.md §6
TEST-8 Unit tests for src/metrics_server.rs. Cover port-conflict handling, timeout behaviour, malformed HTTP request, OpenMetrics format conformance. 10+ tests. 1d PLAN_OVERALL_ASSESSMENT_2.md §6

Implementation Phases

Phase Description Duration
Phase 1 EC-01 fix: row-id hash convergence + PH-D1 cleanup + proptest Days 1–8
Phase 2 Durability: two-phase frontier + change_buffer_durability GUC + crash test Days 8–18
Phase 3 CDC hardening: unwrap removal, publication rebuild, TOAST hashing + tests Days 18–26
Phase 4 Operational: history retention, frozen-ST detector, catalog indexes Days 26–31
Phase 5 Test campaign: TEST-6/7/8 unit tests for publication, diagnostics, metrics Days 31–37
Phase 6 Integration testing, documentation, upgrade script Days 37–42

v0.24.0 total: ~8–9 weeks (~42 person-days solo)

Exit criteria: - [x] EC01-1: Part 1b arm hashes left-side PK only; TPC-H Q07 passes multi-cycle correctness - [x] EC01-2: PH-D1 cleans up prior-cycle phantoms; no residual rows after 10 cycles - [x] EC01-3: Q15 removed from IMMEDIATE_SKIP_ALLOWLIST; TPC-H Q15 passes IMMEDIATE mode - [x] EC01-4: 5,000-iteration proptest passes for JOIN convergence - [x] DUR-1: Two-phase frontier commit implemented; manual and scheduler paths unified - [x] DUR-2: change_buffer_durability = 'logged' creates WAL-logged change buffers; 'unlogged' preserves current behaviour - [x] DUR-3: Crash-recovery E2E: kill bgworker mid-refresh → restart → zero lost/duplicated rows - [x] CDC-1: Zero unwrap() calls in src/cdc.rs production paths - [x] CDC-2: Converting a source table to partitioned triggers automatic publication rebuild - [x] CDC-3: TOAST-only column update detected and propagated in DIFFERENTIAL mode - [x] CDC-4: jsonb + bytea TOAST E2E tests pass - [x] OPS-1: History older than retention_days is pruned automatically; GUC documented - [x] OPS-2: Frozen-ST detector fires alert when ST stalls with active CDC source - [x] OPS-3: Internal catalog indexes exist; scheduler tick time reduced at 100+ STs - [x] TEST-6: 25+ publication.rs unit tests pass (predictive model boundary cases) - [x] TEST-7: 20+ diagnostics.rs unit tests pass - [x] TEST-8: 10+ metrics_server.rs unit tests pass (port conflict, timeout, format) - [x] Extension upgrade path tested (0.23.0 → 0.24.0) - [x] just check-version-sync passes