Contents
-
- v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics
- Quick Polish & Error State Circuit Breaker (Phase 1 + 1b) — ✅ Done
- Manual Tiered Scheduling (Phase 2 — C-1) — ✅ Done
- UNLOGGED Change Buffers (Phase 3 — D-1) — ✅ Done
- Documentation: Best-Practice Patterns Guide (G16-PAT) — ✅ Done
- Long-Running Stability & Multi-Database Testing (G17-SOAK, G17-MDB) — ✅ Done
- Container Infrastructure (INFRA-GHCR)
- Refresh Mode Diagnostics (DIAG-1) — ✅ Done
- Export Definition API (G15-EX) — ✅ Done
- TUI Tool (E3-TUI)
- GUC Surface Consolidation (C4)
- Documentation: Pre-Deployment Checklist (DOC-PDC) — ✅ Done
- Documentation: Operator Mode Support Matrix Cross-Link (DOC-OPM)
- Aggregate Mode Warning at Creation Time (DIAG-2)
- DIFFERENTIAL Refresh for Manual ST-on-ST Path (FIX-STST-DIFF)
- v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics
Plain-language companion: v0.14.0.md
v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics
Status: Released (2026-04-02).
Tiered refresh scheduling, UNLOGGED change buffers, refresh mode diagnostics, error-state circuit breaker, a full-featured TUI dashboard, security hardening (SECURITY DEFINER triggers with explicit search_path), GHCR Docker image, pre-deployment checklist, best-practice patterns guide, and comprehensive E2E test coverage. See CHANGELOG.md for the full feature list.
Quick Polish & Error State Circuit Breaker (Phase 1 + 1b) — ✅ Done
- C4:
pg_trickle.planner_aggressiveGUC consolidatesmerge_planner_hints+merge_work_mem_mb. Old GUCs deprecated. - DIAG-2: Creation-time WARNING for group-rescan and low-cardinality algebraic aggregates.
agg_diff_cardinality_thresholdGUC added. - DOC-OPM: Operator support matrix summary table linked from
SQL_REFERENCE.md. - ERR-1: Permanent failures immediately set
ERRORstatus withlast_error_message/last_error_at. API calls clear error state. E2E test pending.
Manual Tiered Scheduling (Phase 2 — C-1) — ✅ Done
Tiered scheduling infrastructure was already in place since v0.11/v0.12 (refresh_tier column, RefreshTier enum, ALTER ... SET (tier=...), scheduler multipliers). Phase 2 verified completeness and added:
- C-1b: NOTICE on tier demotion from Hot to Cold/Frozen, alerting operators to the effective interval change.
- C-1c: Scheduler tier-aware multipliers confirmed: Hot ×1, Warm ×2, Cold ×10, Frozen = skip. Gated by
pg_trickle.tiered_scheduling(defaulttruesince v0.12.0).
UNLOGGED Change Buffers (Phase 3 — D-1) — ✅ Done
- D-1a:
pg_trickle.unlogged_buffersGUC (defaultfalse). New change buffer tables created asUNLOGGEDwhen enabled, reducing WAL amplification by ~30%. - D-1b: Crash recovery detection — scheduler detects UNLOGGED buffers emptied by crash (postmaster restart after last refresh) and auto-enqueues FULL refresh.
- D-1c:
pgtrickle.convert_buffers_to_unlogged()utility function for converting existing logged buffers. Documents lock-window warning. - D-1e: Documentation in
CONFIGURATION.mdandSQL_REFERENCE.md.
Documentation: Best-Practice Patterns Guide (G16-PAT) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
docs/PATTERNS.md: 6 patterns (Bronze/Silver/Gold, event sourcing, SCD type-½, high-fan-out, real-time dashboards, tiered refresh) with SQL examples, anti-patterns, and refresh mode recommendations. |
— | ✅ Done |
Patterns guide subtotal: ✅ Done
Long-Running Stability & Multi-Database Testing (G17-SOAK, G17-MDB) — ✅ Done
Soak test validates zero worker crashes, zero ERROR states, and stable RSS under sustained mixed DML. Multi-database test validates catalog isolation, shared-memory independence, and concurrent correctness.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
tests/e2e_soak_tests.rs with configurable duration, 5 source tables, mixed DML, health checks, RSS monitoring, correctness verification. just test-soak / just test-soak-short. CI job: schedule + manual dispatch. |
— | ✅ Done |
| |
tests/e2e_mdb_tests.rs with two databases, catalog isolation assertion, concurrent mutation cycles, correctness verification per database. just test-mdb. CI job: schedule + manual dispatch. |
— | ✅ Done |
Stability & multi-database testing subtotal: ✅ Done
Container Infrastructure (INFRA-GHCR)
| Item | Description | Effort | Ref |
|---|---|---|---|
| INFRA-GHCR | GHCR Docker image. Dockerfile.ghcr (pinned to postgres:18.3-bookworm) + .github/workflows/ghcr.yml workflow that builds a multi-arch (linux/amd64 + linux/arm64) PostgreSQL 18.3 server image with pg_trickle pre-installed and all sensible GUC defaults baked in. Smoke-tests on amd64 before push. Published to ghcr.io/grove/pg_trickle on every v* tag with immutable (<version>-pg18.3), floating (pg18), and latest tags. Uses GITHUB_TOKEN — no extra secrets. |
4h | — | ✅ Done |
Container infrastructure subtotal: ✅ Done
Refresh Mode Diagnostics (DIAG-1) — ✅ Done
Analyzes stream table workload characteristics and recommends the optimal refresh mode. Seven weighted signals (change ratio, empirical timing, query complexity, target size, index coverage, latency variance) produce a composite score with confidence level and human-readable explanation.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
src/diagnostics.rs — pure signal-scoring functions + unit tests |
— | ✅ Done |
| |
|
— | ✅ Done |
| |
pgtrickle.recommend_refresh_mode() SQL function |
— | ✅ Done |
| |
pgtrickle.refresh_efficiency() function |
— | ✅ Done |
| |
|
— | ✅ Done |
| |
|
— | ✅ Done |
The function synthesises 7 weighted signals (historical change ratio 0.30, empirical timing 0.35, current change ratio 0.25, query complexity 0.10, target size 0.10, index coverage 0.05, P95/P50 variance 0.05) into a composite score. Confidence degrades gracefully when history is sparse.
Diagnostics subtotal: ~3.5–7 days
Export Definition API (G15-EX) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
export_definition(name TEXT) — export a stream table configuration as reproducible DDL |
— | ✅ Done |
G15-EX subtotal: ~1–2 days
TUI Tool (E3-TUI)
In plain terms: A full-featured terminal user interface (TUI) for managing, monitoring, and diagnosing pg_trickle stream tables without touching SQL. Built with ratatui in Rust, it provides a real-time dashboard (think
htopfor stream tables), interactive dependency graph visualization, live refresh log, diagnostics with signal breakdown charts, CDC health monitoring, a GUC configuration editor, and a real-time alert feed — all navigable with keyboard shortcuts and a command palette. It also supports every original CLI command as one-shot subcommands for scripting and CI.
| Item | Description | Effort | Ref |
|---|---|---|---|
| E3-TUI | TUI tool (pgtrickle) for interactive management and monitoring |
8–10d | PLAN_TUI.md |
E3-TUI subtotal: ~8–10 days (T1–T8 implemented: CLI skeleton with 18 subcommands, interactive dashboard with 15 views, watch mode with
--filter, LISTEN/NOTIFY alerts with JSON parsing, async polling with force-poll, cascade staleness detection, DAG issue detection, sparklines, fuse detail panel, trigger inventory, context-sensitive help, docs/TUI.md)
GUC Surface Consolidation (C4)
| Item | Description | Effort | Ref |
|---|---|---|---|
| C4 | Consolidate merge_planner_hints + merge_work_mem_mb into single planner_aggressive boolean. Reduces GUC surface area; existing two GUCs become aliases that emit a deprecation notice. |
~1–2h | PLAN_FEATURE_CLEANUP.md §C4 |
C4 subtotal: ~1–2 hours
Documentation: Pre-Deployment Checklist (DOC-PDC) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
docs/PRE_DEPLOYMENT.md: 10-point checklist covering PG version, shared_preload_libraries, WAL configuration, PgBouncer compatibility, recommended GUCs, resource planning, monitoring, validation script. Cross-linked from GETTING_STARTED.md and INSTALL.md. |
— | ✅ Done |
DOC-PDC subtotal: ✅ Done
Documentation: Operator Mode Support Matrix Cross-Link (DOC-OPM)
| Item | Description | Effort | Ref |
|---|---|---|---|
| DOC-OPM | Cross-link operator support matrix from SQL_REFERENCE.md. The 60+ operator × FULL/DIFFERENTIAL/IMMEDIATE matrix in DVM_OPERATORS.md is not discoverable from the page users actually read. Add a summary table and prominent link in SQL_REFERENCE.md §Supported SQL Constructs. | ~2–4h | docs/DVM_OPERATORS.md · docs/SQL_REFERENCE.md |
DOC-OPM subtotal: ~2–4 hours
Aggregate Mode Warning at Creation Time (DIAG-2)
In plain terms: Queries with very few distinct GROUP BY groups (e.g. 5 regions from 100K rows) are always faster with FULL refresh — differential overhead exceeds the cost of re-aggregating a tiny result set. Today users discover this only after benchmarking. A creation-time WARNING with an explicit recommendation prevents the surprise. The classification logic is already present in the DVM parser (aggregate strategy classification from
is_algebraically_invertible,is_group_rescan); this item exposes it at the SQL boundary.
| Item | Description | Effort | Ref |
|---|---|---|---|
| DIAG-2 | Aggregate mode warning at create_stream_table time. After parsing the defining query, inspect the top-level operator: if it is an Aggregate node containing non-algebraic (group-rescan) functions such as MIN, MAX, STRING_AGG, ARRAY_AGG, BOOL_AND/OR, emit a WARNING recommending refresh_mode='full' or 'auto' and citing the group-rescan cost. For algebraic aggregates (SUM/COUNT/AVG), emit the warning only when the estimated group cardinality (from pg_stats.n_distinct on the GROUP BY columns) is below pg_trickle.agg_diff_cardinality_threshold (default: 1000 distinct groups), since below this threshold FULL is reliably faster. No behavior change — warning only. |
~2–4h | plans/performance/REPORT_OVERALL_STATUS.md §12.3 |
DIAG-2 subtotal: ~2–4 hours
DIFFERENTIAL Refresh for Manual ST-on-ST Path (FIX-STST-DIFF)
Background: When a stream table reads from another stream table (
calculatedschedule), the scheduler propagates changes via a per-ST change buffer (pgtrickle_changes.changes_pgt_{id}) and performs a true DIFFERENTIAL DVM refresh against that buffer. The manualpgtrickle.refresh_stream_table()path does not: it currently falls back to an unconditionalTRUNCATE + INSERT(FULL refresh) for every call.This was introduced as a correctness fix in v0.13.0 (PR #371) to close a scheduler race where the previous no-op guard could leave stale data in place. The FULL fallback is correct but inefficient — it pays a full table scan of all upstream STs even when only a small delta is present.
What needs to happen: Wire
execute_manual_differential_refreshto use the samechanges_pgt_change buffers the scheduler already writes. When a manual refresh is requested for acalculatedST that has a stored frontier, check each upstream ST’s change buffer for rows withlsn > frontier.get_st_lsn(upstream_pgt_id). If new rows exist, apply the DVM delta SQL (same asexecute_differential_refresh). If no rows exist beyond the frontier, return a true no-op. This also fixes the pre-existingtest_st_on_st_uses_differential_not_fullE2E failure.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
execute_manual_differential_refresh (src/api.rs), replace the unconditional FULL fallback for has_st_source with a proper change-buffer delta path: read rows from changes_pgt_{upstream_pgt_id} beyond the stored frontier LSN, run DVM differential SQL, advance the frontier. Matches the scheduler path exactly. Fixes test_st_on_st_uses_differential_not_full. |
— | ✅ Done |
FIX-STST-DIFF subtotal: ~1–2 days
v0.14.0 total: ~2–6 weeks + ~1wk patterns guide + ~2–4 days stability tests + ~3.5–7 days diagnostics + ~1–2d export API + ~8–10d TUI + ~0.5d docs + ~2–4h aggregate warning + ~1–2d ST-on-ST diff manual path
Exit criteria:
- [x] C-1: Tier classification with manual assignment; Cold STs skip refresh correctly; E2E tested ✅ Done
- [x] D-1: UNLOGGED change buffers opt-in (unlogged_buffers = false by default); crash-recovery FULL-refresh path tested; E2E tested ✅ Done
- [x] G16-PAT: Patterns guide published in docs/PATTERNS.md covering 6 patterns ✅ Done
- [x] G17-SOAK: Soak test passes with zero worker crashes, zero zombie stream tables, stable memory ✅ Done
- [x] G17-MDB: Multi-database scheduler isolation verified ✅ Done
- [x] DIAG-1: recommend_refresh_mode() + refresh_efficiency() implemented with 7 signals; E2E tested; tutorial published ✅ Done
- [x] DIAG-2: WARNING emitted at creation time for group-rescan and low-cardinality aggregates; threshold configurable ✅ Done
- [x] G15-EX: export_definition(name TEXT) returns valid reproducible DDL; round-trip tested ✅ Done
- [x] E3-TUI: pgtrickle TUI binary builds as workspace member; one-shot CLI commands functional with --format json; interactive dashboard launches with no subcommand; 15 views with cascade staleness, issue detection, sparklines, force-poll, NOTIFY, and context-sensitive help; documented in docs/TUI.md ✅ Done
- [x] C4: merge_planner_hints and merge_work_mem_mb consolidated into planner_aggressive ✅ Done
- [x] DOC-PDC: Pre-deployment checklist published in docs/PRE_DEPLOYMENT.md ✅ Done
- [x] DOC-OPM: Operator mode support matrix summary and link added to SQL_REFERENCE.md ✅ Done
- [x] FIX-STST-DIFF: Manual DIFFERENTIAL refresh for ST-on-ST path ✅ Done
- [x] INFRA-GHCR: ghcr.io/grove/pg_trickle multi-arch image builds, smoke-tests, and pushes on v* tags ✅ Done
- [x] ERR-1: Error-state circuit breaker with E2E test coverage ✅ Done
- [x] Extension upgrade path tested (0.13.0 → 0.14.0) ✅ Done