v0.20.0 — Self-Monitoring (pg_trickle Monitors Itself)

Full technical details: v0.20.0.md-full.md

Status: ✅ Released | Scope: Large (~6 weeks)

pg_trickle uses its own stream table technology to maintain reactive analytics about its own behaviour — detecting anomalies, tuning thresholds automatically, and demonstrating incremental view maintenance on a non-trivial real workload.


What problem does this solve?

Operators running pg_trickle in production needed answers to questions like: “Is my differential refresh consistently faster than a full refresh?” and “Is my change buffer growing unexpectedly?” Answering these required running diagnostic queries that scanned potentially large history tables on demand.

The self-monitoring release turns these diagnostics into continuously-maintained stream tables — pg_trickle watches its own behaviour incrementally, the same way it watches your application data. Answers are always available instantaneously, without re-scanning history.


Five Self-Monitoring Stream Tables

pgtrickle.setup_self_monitoring() creates five stream tables that monitor pg_trickle’s own internal catalog and refresh history:

df_efficiency_rolling — Rolling averages of differential and full refresh times per stream table. Replaces expensive full-scan calls to refresh_efficiency() with a continuously-maintained result.

In plain terms: “How fast are my stream tables refreshing, on average over the last hour?” — answered instantly.

df_anomaly_signals — Detects duration spikes (a refresh that took 3× longer than average), error bursts, and mode oscillation (rapid switching between DIFFERENTIAL and FULL).

In plain terms: “Did anything unusual happen in the last refresh cycle?”

df_threshold_advice — Multi-cycle threshold recommendations for AUTO mode. Rather than computing a recommendation from a single data point, this analyses trends across many cycles and assigns a confidence level.

In plain terms: “Should I adjust the AUTO mode threshold for this stream table?” — with a specific recommendation and confidence rating.

df_cdc_buffer_trends — Tracks how fast the change buffer is growing for each source table. Alerts when the buffer is growing toward a size that will cause a slow refresh.

In plain terms: “Is any source table accumulating changes faster than pg_trickle can process them?”

df_scheduling_interference — Detects when multiple stream table refreshes are competing with each other, causing latency spikes due to contention.

In plain terms: “Are my stream tables interfering with each other’s refresh schedules?”


Auto-Apply Threshold Tuning

The pg_trickle.self_monitoring_auto_apply configuration option (when set to threshold_only) allows pg_trickle to automatically adjust the AUTO mode cost threshold for each stream table based on the recommendations from df_threshold_advice, when the confidence is HIGH and the change is significant.

Changes are rate-limited (at most once per stream table per 10 minutes) and recorded in the refresh history with the reason initiated_by = 'SELF_MONITOR' for auditability.

In plain terms: pg_trickle can tune itself — finding the optimal balance between differential and full refresh for your specific workload, automatically.


Grafana Self-Monitoring Dashboard

A new Grafana dashboard in monitoring/grafana/ visualises all five self-monitoring stream tables with live panels:

  • Refresh throughput timeline
  • Anomaly heatmap by stream table
  • Threshold calibration scatter plot
  • CDC buffer growth sparklines
  • Scheduling interference matrix

dbt Integration

A new pgtrickle_enable_monitoring dbt post-hook macro calls setup_self_monitoring() automatically after a successful dbt run, so dbt users get self-monitoring activated as part of their normal workflow.


Operational Helpers

  • pgtrickle.recommend_refresh_mode(name) — returns an instant recommendation based on df_threshold_advice, rather than computing one on demand
  • pgtrickle.explain_dag() — returns the full stream table dependency graph as a Mermaid diagram, with colour-coding for user vs self-monitoring tables
  • pgtrickle.scheduler_overhead() — shows the fraction of scheduler time spent on self-monitoring tables vs user tables (target: < 1%)

Scope

v0.20.0 demonstrates pg_trickle’s own value proposition on itself — a non-trivial incremental analytics workload maintained continuously with negligible overhead. The five self-monitoring stream tables replace on-demand diagnostic queries with always-current reactive analytics, and the auto-apply feature closes the feedback loop from monitoring to optimisation.