Plain-language companion: v0.14.0.md

v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics

Status: Released (2026-04-02).

Tiered refresh scheduling, UNLOGGED change buffers, refresh mode diagnostics, error-state circuit breaker, a full-featured TUI dashboard, security hardening (SECURITY DEFINER triggers with explicit search_path), GHCR Docker image, pre-deployment checklist, best-practice patterns guide, and comprehensive E2E test coverage. See CHANGELOG.md for the full feature list.

Quick Polish & Error State Circuit Breaker (Phase 1 + 1b) — ✅ Done

  • C4: pg_trickle.planner_aggressive GUC consolidates merge_planner_hints + merge_work_mem_mb. Old GUCs deprecated.
  • DIAG-2: Creation-time WARNING for group-rescan and low-cardinality algebraic aggregates. agg_diff_cardinality_threshold GUC added.
  • DOC-OPM: Operator support matrix summary table linked from SQL_REFERENCE.md.
  • ERR-1: Permanent failures immediately set ERROR status with last_error_message/last_error_at. API calls clear error state. E2E test pending.

Manual Tiered Scheduling (Phase 2 — C-1) — ✅ Done

Tiered scheduling infrastructure was already in place since v0.11/v0.12 (refresh_tier column, RefreshTier enum, ALTER ... SET (tier=...), scheduler multipliers). Phase 2 verified completeness and added:

  • C-1b: NOTICE on tier demotion from Hot to Cold/Frozen, alerting operators to the effective interval change.
  • C-1c: Scheduler tier-aware multipliers confirmed: Hot ×1, Warm ×2, Cold ×10, Frozen = skip. Gated by pg_trickle.tiered_scheduling (default true since v0.12.0).

UNLOGGED Change Buffers (Phase 3 — D-1) — ✅ Done

  • D-1a: pg_trickle.unlogged_buffers GUC (default false). New change buffer tables created as UNLOGGED when enabled, reducing WAL amplification by ~30%.
  • D-1b: Crash recovery detection — scheduler detects UNLOGGED buffers emptied by crash (postmaster restart after last refresh) and auto-enqueues FULL refresh.
  • D-1c: pgtrickle.convert_buffers_to_unlogged() utility function for converting existing logged buffers. Documents lock-window warning.
  • D-1e: Documentation in CONFIGURATION.md and SQL_REFERENCE.md.

Documentation: Best-Practice Patterns Guide (G16-PAT) — ✅ Done

Item Description Effort Ref
G16-PAT Best-practice patterns guide. docs/PATTERNS.md: 6 patterns (Bronze/Silver/Gold, event sourcing, SCD type-½, high-fan-out, real-time dashboards, tiered refresh) with SQL examples, anti-patterns, and refresh mode recommendations. ✅ Done

Patterns guide subtotal: ✅ Done

Long-Running Stability & Multi-Database Testing (G17-SOAK, G17-MDB) — ✅ Done

Soak test validates zero worker crashes, zero ERROR states, and stable RSS under sustained mixed DML. Multi-database test validates catalog isolation, shared-memory independence, and concurrent correctness.

Item Description Effort Ref
G17-SOAK Long-running stability soak test. tests/e2e_soak_tests.rs with configurable duration, 5 source tables, mixed DML, health checks, RSS monitoring, correctness verification. just test-soak / just test-soak-short. CI job: schedule + manual dispatch. ✅ Done
G17-MDB Multi-database scheduler isolation test. tests/e2e_mdb_tests.rs with two databases, catalog isolation assertion, concurrent mutation cycles, correctness verification per database. just test-mdb. CI job: schedule + manual dispatch. ✅ Done

Stability & multi-database testing subtotal: ✅ Done

Container Infrastructure (INFRA-GHCR)

Item Description Effort Ref
INFRA-GHCR GHCR Docker image. Dockerfile.ghcr (pinned to postgres:18.3-bookworm) + .github/workflows/ghcr.yml workflow that builds a multi-arch (linux/amd64 + linux/arm64) PostgreSQL 18.3 server image with pg_trickle pre-installed and all sensible GUC defaults baked in. Smoke-tests on amd64 before push. Published to ghcr.io/grove/pg_trickle on every v* tag with immutable (<version>-pg18.3), floating (pg18), and latest tags. Uses GITHUB_TOKEN — no extra secrets. 4h — | ✅ Done

Container infrastructure subtotal: ✅ Done

Refresh Mode Diagnostics (DIAG-1) — ✅ Done

Analyzes stream table workload characteristics and recommends the optimal refresh mode. Seven weighted signals (change ratio, empirical timing, query complexity, target size, index coverage, latency variance) produce a composite score with confidence level and human-readable explanation.

Item Description Effort Ref
DIAG-1a src/diagnostics.rs — pure signal-scoring functions + unit tests ✅ Done
DIAG-1b SPI data-gathering layer ✅ Done
DIAG-1c pgtrickle.recommend_refresh_mode() SQL function ✅ Done
DIAG-1d pgtrickle.refresh_efficiency() function ✅ Done
DIAG-1e E2E integration tests; upgrade migration ✅ Done
DIAG-1f Documentation: SQL_REFERENCE.md additions ✅ Done

The function synthesises 7 weighted signals (historical change ratio 0.30, empirical timing 0.35, current change ratio 0.25, query complexity 0.10, target size 0.10, index coverage 0.05, P95/P50 variance 0.05) into a composite score. Confidence degrades gracefully when history is sparse.

Diagnostics subtotal: ~3.5–7 days

Export Definition API (G15-EX) — ✅ Done

Item Description Effort Ref
G15-EX export_definition(name TEXT) — export a stream table configuration as reproducible DDL ✅ Done

G15-EX subtotal: ~1–2 days

TUI Tool (E3-TUI)

In plain terms: A full-featured terminal user interface (TUI) for managing, monitoring, and diagnosing pg_trickle stream tables without touching SQL. Built with ratatui in Rust, it provides a real-time dashboard (think htop for stream tables), interactive dependency graph visualization, live refresh log, diagnostics with signal breakdown charts, CDC health monitoring, a GUC configuration editor, and a real-time alert feed — all navigable with keyboard shortcuts and a command palette. It also supports every original CLI command as one-shot subcommands for scripting and CI.

Item Description Effort Ref
E3-TUI TUI tool (pgtrickle) for interactive management and monitoring 8–10d PLAN_TUI.md

E3-TUI subtotal: ~8–10 days (T1–T8 implemented: CLI skeleton with 18 subcommands, interactive dashboard with 15 views, watch mode with --filter, LISTEN/NOTIFY alerts with JSON parsing, async polling with force-poll, cascade staleness detection, DAG issue detection, sparklines, fuse detail panel, trigger inventory, context-sensitive help, docs/TUI.md)

GUC Surface Consolidation (C4)

Item Description Effort Ref
C4 Consolidate merge_planner_hints + merge_work_mem_mb into single planner_aggressive boolean. Reduces GUC surface area; existing two GUCs become aliases that emit a deprecation notice. ~1–2h PLAN_FEATURE_CLEANUP.md §C4

C4 subtotal: ~1–2 hours

Documentation: Pre-Deployment Checklist (DOC-PDC) — ✅ Done

Item Description Effort Ref
DOC-PDC Pre-deployment checklist page. docs/PRE_DEPLOYMENT.md: 10-point checklist covering PG version, shared_preload_libraries, WAL configuration, PgBouncer compatibility, recommended GUCs, resource planning, monitoring, validation script. Cross-linked from GETTING_STARTED.md and INSTALL.md. ✅ Done

DOC-PDC subtotal: ✅ Done

Documentation: Operator Mode Support Matrix Cross-Link (DOC-OPM)

Item Description Effort Ref
DOC-OPM Cross-link operator support matrix from SQL_REFERENCE.md. The 60+ operator × FULL/DIFFERENTIAL/IMMEDIATE matrix in DVM_OPERATORS.md is not discoverable from the page users actually read. Add a summary table and prominent link in SQL_REFERENCE.md §Supported SQL Constructs. ~2–4h docs/DVM_OPERATORS.md · docs/SQL_REFERENCE.md

DOC-OPM subtotal: ~2–4 hours

Aggregate Mode Warning at Creation Time (DIAG-2)

In plain terms: Queries with very few distinct GROUP BY groups (e.g. 5 regions from 100K rows) are always faster with FULL refresh — differential overhead exceeds the cost of re-aggregating a tiny result set. Today users discover this only after benchmarking. A creation-time WARNING with an explicit recommendation prevents the surprise. The classification logic is already present in the DVM parser (aggregate strategy classification from is_algebraically_invertible, is_group_rescan); this item exposes it at the SQL boundary.

Item Description Effort Ref
DIAG-2 Aggregate mode warning at create_stream_table time. After parsing the defining query, inspect the top-level operator: if it is an Aggregate node containing non-algebraic (group-rescan) functions such as MIN, MAX, STRING_AGG, ARRAY_AGG, BOOL_AND/OR, emit a WARNING recommending refresh_mode='full' or 'auto' and citing the group-rescan cost. For algebraic aggregates (SUM/COUNT/AVG), emit the warning only when the estimated group cardinality (from pg_stats.n_distinct on the GROUP BY columns) is below pg_trickle.agg_diff_cardinality_threshold (default: 1000 distinct groups), since below this threshold FULL is reliably faster. No behavior change — warning only. ~2–4h plans/performance/REPORT_OVERALL_STATUS.md §12.3

DIAG-2 subtotal: ~2–4 hours

DIFFERENTIAL Refresh for Manual ST-on-ST Path (FIX-STST-DIFF)

Background: When a stream table reads from another stream table (calculated schedule), the scheduler propagates changes via a per-ST change buffer (pgtrickle_changes.changes_pgt_{id}) and performs a true DIFFERENTIAL DVM refresh against that buffer. The manual pgtrickle.refresh_stream_table() path does not: it currently falls back to an unconditional TRUNCATE + INSERT (FULL refresh) for every call.

This was introduced as a correctness fix in v0.13.0 (PR #371) to close a scheduler race where the previous no-op guard could leave stale data in place. The FULL fallback is correct but inefficient — it pays a full table scan of all upstream STs even when only a small delta is present.

What needs to happen: Wire execute_manual_differential_refresh to use the same changes_pgt_ change buffers the scheduler already writes. When a manual refresh is requested for a calculated ST that has a stored frontier, check each upstream ST’s change buffer for rows with lsn > frontier.get_st_lsn(upstream_pgt_id). If new rows exist, apply the DVM delta SQL (same as execute_differential_refresh). If no rows exist beyond the frontier, return a true no-op. This also fixes the pre-existing test_st_on_st_uses_differential_not_full E2E failure.

Item Description Effort Ref
FIX-STST-DIFF DIFFERENTIAL manual refresh for ST-on-ST. In execute_manual_differential_refresh (src/api.rs), replace the unconditional FULL fallback for has_st_source with a proper change-buffer delta path: read rows from changes_pgt_{upstream_pgt_id} beyond the stored frontier LSN, run DVM differential SQL, advance the frontier. Matches the scheduler path exactly. Fixes test_st_on_st_uses_differential_not_full. ✅ Done

FIX-STST-DIFF subtotal: ~1–2 days

v0.14.0 total: ~2–6 weeks + ~1wk patterns guide + ~2–4 days stability tests + ~3.5–7 days diagnostics + ~1–2d export API + ~8–10d TUI + ~0.5d docs + ~2–4h aggregate warning + ~1–2d ST-on-ST diff manual path

Exit criteria: - [x] C-1: Tier classification with manual assignment; Cold STs skip refresh correctly; E2E tested ✅ Done - [x] D-1: UNLOGGED change buffers opt-in (unlogged_buffers = false by default); crash-recovery FULL-refresh path tested; E2E tested ✅ Done - [x] G16-PAT: Patterns guide published in docs/PATTERNS.md covering 6 patterns ✅ Done - [x] G17-SOAK: Soak test passes with zero worker crashes, zero zombie stream tables, stable memory ✅ Done - [x] G17-MDB: Multi-database scheduler isolation verified ✅ Done - [x] DIAG-1: recommend_refresh_mode() + refresh_efficiency() implemented with 7 signals; E2E tested; tutorial published ✅ Done - [x] DIAG-2: WARNING emitted at creation time for group-rescan and low-cardinality aggregates; threshold configurable ✅ Done - [x] G15-EX: export_definition(name TEXT) returns valid reproducible DDL; round-trip tested ✅ Done - [x] E3-TUI: pgtrickle TUI binary builds as workspace member; one-shot CLI commands functional with --format json; interactive dashboard launches with no subcommand; 15 views with cascade staleness, issue detection, sparklines, force-poll, NOTIFY, and context-sensitive help; documented in docs/TUI.md ✅ Done - [x] C4: merge_planner_hints and merge_work_mem_mb consolidated into planner_aggressive ✅ Done - [x] DOC-PDC: Pre-deployment checklist published in docs/PRE_DEPLOYMENT.md ✅ Done - [x] DOC-OPM: Operator mode support matrix summary and link added to SQL_REFERENCE.md ✅ Done - [x] FIX-STST-DIFF: Manual DIFFERENTIAL refresh for ST-on-ST path ✅ Done - [x] INFRA-GHCR: ghcr.io/grove/pg_trickle multi-arch image builds, smoke-tests, and pushes on v* tags ✅ Done - [x] ERR-1: Error-state circuit breaker with E2E test coverage ✅ Done - [x] Extension upgrade path tested (0.13.0 → 0.14.0) ✅ Done