Plan: Fuse — Anomalous Change Volume Protection

Date: 2026-03-02 Status: EXPLORATION Last Updated: 2026-03-02


1. Problem Statement

Catastrophic Data Changes from External Failures

An external ETL process fails and blanks out a source table — issuing a DELETE FROM or TRUNCATE followed by an incomplete reload. The change buffer records hundreds of thousands of deletions. On the next scheduler tick, pg_trickle faithfully propagates these deletions through the DAG, wiping out all downstream stream tables.

By the time the ETL team notices the bug, every dependent view is empty or severely degraded. Restoring the correct state requires a full re-run of the ETL pipeline plus manual refreshes of every affected stream table.

The Problem is Not Limited to ETL

Similar patterns occur with:

  • Schema migrations gone wrong — a poorly written migration deletes/ updates a large fraction of rows.
  • Runaway application bugs — a micro-service enters a loop that mass-deletes records.
  • Accidental bulk operationsUPDATE orders SET status = 'cancelled' without a WHERE clause.
  • Faulty CDC replay — a logical replication slot replays old events, causing a spike of phantom changes.

In all cases, the change volume is grossly abnormal compared to the table’s historical pattern. A human looking at the change buffer would say “this doesn’t look right” — but the current system blindly applies the delta.

What Exists Today

pg_trickle already has a change ratio check that triggers an adaptive fallback from DIFFERENTIAL to FULL refresh when the delta exceeds a configurable fraction of the table size (default 15%). This is a performance optimization — not a safety mechanism. The FULL refresh still applies the changes; it just does so via TRUNCATE + INSERT rather than MERGE.

The system also has auto-suspension after consecutive errors (default 3), but this only fires on refresh failures — not on successful refreshes that happen to apply anomalous deltas.

Neither mechanism prevents the propagation of anomalous changes.

Relationship to Watermark Gating

PLAN_WATERMARK_GATING.md addresses the case where external data is incomplete — the fuse addresses the case where external data is wrong. The two features are complementary:

  • Watermark gating answers: “Has enough data arrived?”
  • Fuse answers: “Does this data look right?”

A stream table can have both. Watermark alignment may pass (both sources have advanced their watermarks) but the fuse may blow because one source had an anomalous volume of deletions. Conversely, the fuse may be intact but watermark gating may block because one source hasn’t reported completeness yet.


2. Proposed Mechanism: Fuse

2.1 Core Concept

A fuse is a per-ST protective mechanism that monitors change volume and halts refresh when it detects an anomalous spike. Like a physical fuse, it blows when current (change volume) exceeds its rating (threshold), and must be manually replaced/reset before normal operation can resume.

The fuse has two states:

                  ┌──────────┐
                  │  INTACT  │  ← normal operation
                  └────┬─────┘
                       │ anomalous change detected
                       ▼
                  ┌──────────┐
                  │  BLOWN   │  ← refresh halted
                  └────┬─────┘
                       │ user resets via SQL function
                       ▼
                  ┌──────────┐
                  │  INTACT  │  ← normal operation resumes
                  └──────────┘

When BLOWN, the stream table’s refresh is skipped on every scheduler tick. The change buffer continues to accumulate (no data is lost), but no delta is applied to the stream table’s storage. The ST retains its last known-good state.

The user must explicitly reset the fuse after investigating and resolving the root cause:

SELECT pgtrickle.reset_fuse('order_summary');

2.2 Why Manual Reset?

Unlike a circuit breaker (which auto-resets after a cooldown), a blown fuse requires human judgment — just as replacing a physical fuse requires someone to investigate why it blew:

  • The anomalous changes may still be in the change buffer — auto-resetting would just blow the fuse again.
  • The user needs to decide: should the system apply the buffered changes (they were legitimate), discard them (they were erroneous), or reinitialize the ST from scratch?
  • Automatic recovery could mask the original problem, leading to silent data quality issues.

3. Trip Conditions

The key design question: how does the fuse decide that change volume is “anomalous”?

3.1 Option A — Fixed Threshold (Absolute)

The user configures a hard limit on the number or fraction of changes per refresh cycle.

SELECT pgtrickle.alter_stream_table('order_summary',
    fuse_max_changes => 10000    -- absolute row count
);
-- or
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse_max_ratio => 0.50       -- 50% of table size
);
Pros Cons
Simple to understand and configure Requires manual tuning per ST
Predictable — no surprises Doesn’t adapt to seasonal patterns or organic growth
Easy to implement Too-tight thresholds cause false blows; too-loose miss real anomalies
User must know what “normal” looks like before setting the threshold

3.2 Option B — Adaptive (Statistical)

The system maintains a rolling baseline of historical change volumes and blows the fuse when the current delta deviates significantly from the baseline.

The fuse keeps a short history of recent refresh delta sizes and computes:

$$\text{blow if } \Delta_{\text{current}} > \mu + k \cdot \sigma$$

Where: - $\mu$ = mean delta size over the last $N$ refreshes - $\sigma$ = standard deviation - $k$ = sensitivity multiplier (configurable, default e.g. 3.0) - $N$ = window size (configurable, default e.g. 20 refreshes)

Cold-start behavior: Until $N$ refreshes have been recorded, the fuse is either disabled or uses a conservative fixed threshold as a backstop.

Pros Cons
Self-tuning — adapts to organic growth and seasonal patterns More complex to implement and explain
No per-ST manual tuning needed Can be fooled by gradual drift (boiling frog)
Statistical foundation — well-understood anomaly detection Cold-start period where the fuse is blind
Catches anomalies relative to this ST’s normal behavior The distribution of delta sizes may not be Gaussian — outliers may need different models
Window size is a hidden tuning parameter

3.3 Option C — Combined (Adaptive with Hard Ceiling)

Use the adaptive model (Option B) for normal operation, but add a hard ceiling that always blows the fuse regardless of the statistical baseline. This handles the “gradual drift followed by catastrophe” case.

SELECT pgtrickle.alter_stream_table('order_summary',
    fuse              => 'adaptive',
    fuse_ceiling      => 100000,    -- absolute hard limit
    fuse_sensitivity  => 3.0        -- std dev multiplier
);
Pros Cons
Best of both worlds — adaptive for subtle anomalies, ceiling for catastrophes Most complex configuration surface
The ceiling acts as a safety net against gradual baseline drift Three parameters to understand
Adaptive handles normal variation without false positives

3.4 Option D — Per-Source Monitoring

Instead of (or in addition to) monitoring the delta applied to the ST, monitor the change buffer growth per source table. Blow the fuse on any ST that depends on a source with anomalous change buffer activity.

This catches the problem earlier — before the delta query is even planned.

Pros Cons
Catches problems at the source, protecting all downstream STs A source anomaly affects all STs, even those that might handle it fine
No per-ST configuration needed for the monitoring itself Harder to tie to specific STs — requires DAG traversal
Reduces wasted work (no planning/executing a delta that will be rejected) The change buffer count isn’t the same as the delta size (CDC may compress changes)

3.5 Discussion

Options B and C (adaptive, optionally with a ceiling) are the most appealing for a “set it and forget it” user experience. Option A is the simplest MVP.

Key open questions:

  1. Which metric to monitor? Candidates:

    • delta_row_count (from change buffer — number of raw CDC events)
    • rows_inserted + rows_deleted (from MERGE — actual applied changes)
    • Change buffer row count per source (pre-delta)
    • Each tells a slightly different story. Which is the right signal?
  2. Should the check happen before or after the delta query runs?

    • Before (pre-check): Count change buffer rows in the LSN range. Cheaper — avoids running an expensive delta query only to discard it. But the change buffer count may not correspond 1:1 to the final delta (due to offsetting inserts/deletes, PK-level compression, etc.).
    • After (post-check): Run the delta, examine the result size before applying it. More accurate but wastes compute on the delta query.
    • A hybrid is possible: pre-check for a cheap early-out, post-check for precision.
  3. Interaction with the adaptive DIFFERENTIAL→FULL fallback: The existing adaptive fallback already counts change buffer rows and compares to a ratio threshold. The fuse check could share this infrastructure — but the semantics differ: the fallback switches to FULL (still applies changes), while the fuse blocks changes. The fallback ratio threshold (default 15%) is lower than a typical fuse rating (e.g. 50% or statistical outlier). These are two distinct safety layers operating in sequence:

   Change buffer rows counted
          │
          ▼
   ┌─────────────────────┐     blow    ┌────────────────────────┐
   │  Fuse                │────────────▶│  BLOWN — refresh halted│
   │  (e.g. μ + 3σ)      │             └────────────────────────┘
   └──────────┬──────────┘
              │ pass
              ▼
   ┌─────────────────────┐   exceeds   ┌────────────────────────┐
   │  Adaptive fallback   │────────────▶│  Switch DIFF → FULL    │
   │  (e.g. 15% of rows)  │            └────────────────────────┘
   └──────────┬──────────┘
              │ pass
              ▼
       Normal DIFFERENTIAL refresh

4. Fuse State

4.1 Catalog Storage

A fuse is per-ST. The state needs to survive server restarts.

Option: Columns on pgt_stream_tables

ALTER TABLE pgtrickle.pgt_stream_tables ADD COLUMN
    fuse_mode         TEXT NOT NULL DEFAULT 'none'
                      CHECK (fuse_mode IN ('none', 'fixed', 'adaptive')),
    fuse_state        TEXT NOT NULL DEFAULT 'intact'
                      CHECK (fuse_state IN ('intact', 'blown')),
    fuse_blown_at     TIMESTAMPTZ,
    fuse_blow_reason  TEXT,
    fuse_ceiling      BIGINT,          -- hard limit (rows)
    fuse_sensitivity  FLOAT8,          -- std dev multiplier (adaptive mode)
    fuse_window_size  INT;             -- number of refreshes for rolling baseline

Option: Separate table

CREATE TABLE pgtrickle.pgt_fuses (
    pgt_id         BIGINT PRIMARY KEY REFERENCES pgt_stream_tables,
    mode           TEXT NOT NULL DEFAULT 'none',
    state          TEXT NOT NULL DEFAULT 'intact',
    blown_at       TIMESTAMPTZ,
    blow_reason    TEXT,
    ceiling        BIGINT,
    sensitivity    FLOAT8 DEFAULT 3.0,
    window_size    INT DEFAULT 20
);

A separate table keeps pgt_stream_tables from growing wider with every new feature. But it adds a JOIN in the scheduler hot path.

No decision made. The separate table is cleaner long-term; inline columns are faster at runtime. This should be decided alongside the broader catalog width discussion in REPORT_DB_SCHEMA_STABILITY.md.

4.2 Rolling Baseline Storage

The adaptive model needs historical delta sizes. Three candidates:

A. Derived from pgt_refresh_history — query the last $N$ successful refreshes and compute $\mu$ and $\sigma$ from delta_row_count. No additional storage needed.

Pros Cons
Zero new schema — data already exists Requires a SQL query per ST per tick
History is already pruned/retained per existing policy If refresh history is pruned aggressively, the window shrinks

B. Pre-computed summary in catalog — maintain fuse_mean and fuse_stddev columns, updated incrementally after each successful refresh using Welford’s online algorithm.

Pros Cons
O(1) lookup in scheduler — no extra query Loses individual data points — can’t recompute on window size change
Survives history pruning Another incremental update in the refresh path

C. Ring buffer in JSONB — store the last $N$ delta_row_count values as a JSONB array on the fuse row.

Pros Cons
Full data for recomputation JSONB array manipulation per refresh
Configurable window without data loss Grows with window size (bounded)

Option A (derive from history) is simplest and sufficient if refresh history retention is at least as large as the window size. Option B is the most efficient at runtime.


5. SQL API

5.1 Configuration

-- Enable adaptive fuse with defaults
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse => 'adaptive'
);

-- Adaptive with custom sensitivity and hard ceiling
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse             => 'adaptive',
    fuse_sensitivity => 4.0,       -- 4σ before blowing
    fuse_ceiling     => 100000     -- absolute hard limit
);

-- Fixed threshold only
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse         => 'fixed',
    fuse_ceiling => 50000      -- blow at 50k changes
);

-- Disable
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse => 'none'
);

5.2 Manual Reset

-- Reset the fuse and resume normal refreshes.
-- Pending changes in the buffer will be applied on the next tick.
SELECT pgtrickle.reset_fuse('order_summary');

Options on reset — open question:

The user may want to control what happens to buffered changes when resetting.

-- Option A: Apply buffered changes normally (default)
SELECT pgtrickle.reset_fuse('order_summary');

-- Option B: Discard buffered changes and reinitialize from scratch
SELECT pgtrickle.reset_fuse('order_summary',
    action => 'reinitialize'
);

-- Option C: Discard changes up to the current LSN without applying them,
-- then resume differential from the new baseline
SELECT pgtrickle.reset_fuse('order_summary',
    action => 'skip_changes'
);
Action Behavior
'apply' (default) Buffered changes are applied on next tick. Use when the changes were legitimate but you wanted to verify first.
'reinitialize' ST is truncated and fully repopulated from the defining query. Safest when the source data has been corrected.
'skip_changes' Frontier is advanced past the buffered changes without applying them. Use when the anomalous changes have been rolled back at the source and you want to “forget” what happened.

5.3 Introspection

-- Fuse state for all STs
SELECT * FROM pgtrickle.fuse_status();

Returns:

Column Type Description
st_name TEXT Stream table name
mode TEXT 'none', 'fixed', 'adaptive'
state TEXT 'intact', 'blown'
blown_at TIMESTAMPTZ When the fuse blew (NULL if intact)
blow_reason TEXT Human-readable explanation
baseline_mean FLOAT8 Rolling mean delta size (adaptive mode)
baseline_stddev FLOAT8 Rolling std dev (adaptive mode)
last_delta BIGINT Delta that blew the fuse
ceiling BIGINT Hard ceiling (if set)
sensitivity FLOAT8 σ multiplier (adaptive mode)

5.4 Alerts

When the fuse blows, emit a NOTIFY pg_trickle_alert event:

{
  "event": "fuse_blown",
  "st_name": "order_summary",
  "schema": "public",
  "delta_row_count": 85432,
  "baseline_mean": 1200.5,
  "baseline_stddev": 340.2,
  "ceiling": null,
  "computed_threshold": 2221.1,
  "blow_reason": "delta 85432 exceeds adaptive threshold 2221 (μ=1201, 3.0σ=1020)"
}

This integrates with the existing pg_trickle_alert NOTIFY channel and follows the same JSON event format used by stale_data, auto_suspended, and other existing alert types.


6. Scheduler Integration

6.1 Where the Check Runs

The fuse check runs in the scheduler’s per-ST refresh loop, before the delta query is planned and executed:

for each ST in topological_order:
    ... existing checks (status, schedule, advisory lock, upstream changes) ...

    // ── Fuse gate ─────────────────────────────────────────
    if st.fuse_state == 'blown':
        log!("fuse blown for {}", st.name)
        skip this ST
        continue

    if st.fuse_mode != 'none':
        change_count = count_change_buffer_rows(st, prev_frontier, new_frontier)
        if should_blow(st, change_count):
            blow_fuse(st, change_count, reason)
            emit NOTIFY fuse_blown
            skip this ST
            continue
    // ──────────────────────────────────────────────────────

    ... adaptive fallback check (existing) ...
    ... proceed with refresh ...

6.2 Relationship to Existing Adaptive Fallback

Both the fuse and the adaptive DIFFERENTIAL→FULL fallback examine change buffer row counts, but they serve different purposes and operate at different thresholds:

Mechanism Typical threshold Effect Purpose
Fuse Statistical outlier or hard ceiling Block refresh entirely Data safety
Adaptive fallback ~15% of table (auto-tuned) Switch DIFF → FULL Performance optimization

The fuse runs first. If it passes, the adaptive fallback runs as usual. Both can share the change buffer count query to avoid duplicate SPI calls.

6.3 Interaction with Consistency Groups

If a blown fuse’s ST is a member of a diamond consistency group or a watermark-gated group:

  • Diamond atomic group: The blown ST cannot refresh, so the entire group is skipped. This is consistent with the atomic guarantee — all-or-nothing. The group retries next tick (and will skip again until the fuse is reset).
  • Watermark group: The fuse adds an additional gate on top of watermark alignment. Even if all watermarks align, a blown fuse on any member blocks that member (and its atomic group, if any).
  • Cascade STs: Downstream STs that depend on the blown ST will see no upstream changes and naturally produce empty deltas. They may still refresh (applying zero changes) or be skipped if the scheduler’s “upstream changes” pre-check short-circuits first.

7. Concrete Example

Scenario: ETL Failure Blanks Out orders

-- Setup
CREATE TABLE orders (id INT PRIMARY KEY, customer TEXT, amount NUMERIC);
CREATE TABLE order_lines (order_id INT, product TEXT, qty INT);

SELECT pgtrickle.create_stream_table('order_summary',
    'SELECT customer, SUM(amount) AS total FROM orders GROUP BY customer');

SELECT pgtrickle.create_stream_table('line_summary',
    'SELECT product, SUM(qty) AS total_qty FROM order_lines GROUP BY product');

SELECT pgtrickle.create_stream_table('order_report',
    'SELECT os.customer, os.total, ls.total_qty
     FROM order_summary os
     JOIN line_summary ls ON os.customer = ls.product');

-- Enable fuses
SELECT pgtrickle.alter_stream_table('order_summary',
    fuse => 'adaptive');
SELECT pgtrickle.alter_stream_table('line_summary',
    fuse         => 'adaptive',
    fuse_ceiling => 50000);

Normal operation over 100 refresh cycles: - order_summary typically sees 50–200 changes per cycle (μ=120, σ=45). - line_summary typically sees 100–400 changes per cycle (μ=230, σ=80).

ETL failure at T=12:05: 1. External job runs DELETE FROM orders (accidentally). 2. Change buffer records 50,000 deletion events. 3. Next scheduler tick: - Fuse on order_summary evaluates: $50{,}000 > 120 + 3 \times 45 = 255$. Blown. - order_summary is skipped. It retains its last-known-good contents. - order_report (downstream) sees no upstream changes, skips. 4. Alert fires: fuse_blown with delta_row_count=50000. 5. ETL team investigates, fixes the pipeline, reloads orders. 6. Team resets the fuse: sql SELECT pgtrickle.reset_fuse('order_summary', action => 'reinitialize'); 7. order_summary does a FULL refresh from the corrected data. 8. Normal operation resumes.


8. Open Questions

  1. Pre-check vs post-check: Should the fuse examine the change buffer count (pre-check, cheap but approximate) or the actual delta result (post-check, accurate but expensive)? Or both? The pre-check can share the count query with the adaptive fallback. A post-check would require materializing the delta to a temp table, checking its size, then either applying or discarding it.

  2. Granularity — per-ST or per-source? If a source table has an anomalous spike, should ALL downstream STs blow, or only those whose individual delta is anomalous? Per-source blowing is simpler but may over-block. Per-ST is more precise but requires per-ST statistics.

  3. Cascade behavior: When a fuse blows on an intermediate ST, should all downstream STs also be explicitly blocked? They’ll see no upstream changes (since the blown ST didn’t update), so they’ll naturally produce empty deltas. But should this be made explicit in the UI / monitoring output?

  4. GUC for global default: Should there be a pg_trickle.fuse_default GUC that applies a default mode to all STs (e.g. 'adaptive')? This avoids configuring each ST individually in large deployments. The per-ST setting would override.

  5. Interaction with manual refresh: If a user calls pgtrickle.refresh_stream_table('order_summary') while the fuse is blown, should it be honored or blocked?

    • Block: The fuse is a safety mechanism; bypassing it defeats the purpose.
    • Honor: The user may be testing after a fix; they know what they’re doing.
    • Force flag: refresh_stream_table('order_summary', force => true) to explicitly bypass.
  6. Cold-start sensitivity: During the first $N$ refreshes, the adaptive model has no baseline. Options:

    • Fuse is inactive during cold-start (simplest, but unprotected).
    • Use a configurable cold-start ceiling (safe, but adds a parameter).
    • Use a high fixed default (e.g. 10,000 changes) until enough history accumulates.
  7. Separate deletions from insertions? A mass DELETE is usually more alarming than a mass INSERT (new data rarely blanks out downstream STs). Should the fuse weight deletions differently? For example:

    • Blow on deletions at $\mu + 2\sigma$ but on insertions at $\mu + 4\sigma$.
    • Or: track insertion and deletion baselines separately.
    • Or: monitor net change (inserts − deletes) — a large negative net change is the strongest signal of data loss.
  8. Recovery workflow UX: The three-option reset ('apply', 'reinitialize', 'skip_changes') provides flexibility but may overwhelm new users. Should the default be:

    • 'apply' (least surprising — “just let it through”), or
    • 'reinitialize' (safest — “start fresh”), or
    • Require the user to explicitly choose (no default — forces deliberate decision)?
  9. Windowed vs exponential decay: The rolling baseline could use:

    • A fixed window of the last $N$ refreshes (simple, bounded memory).
    • An exponentially weighted moving average (EWMA), which gives more weight to recent refreshes and adapts faster to legitimate baseline shifts.
    • EWMA is better for STs with seasonal or trending change patterns. Fixed window is simpler to explain and debug.
  10. Half-open / audit state: Instead of binary intact/blown, should there be a “test” state where the system runs a single refresh into a staging area (e.g. a temp table or shadow copy) without applying it to the main storage? The user can inspect the staged delta before committing. This adds significant complexity but provides a safe investigation path.

  11. Notification channels: Beyond pg_trickle_alert NOTIFY, should the fuse integrate with external alerting? For example, writing to a pg_trickle_events table that external monitoring (Prometheus, PagerDuty) can poll. This may be better handled by a generic alerting plan rather than fuse-specific.


9. Relationship to Other Plans

Plan Relationship
PLAN_WATERMARK_GATING.md Complementary. Watermark gating prevents refresh when external data is incomplete. The fuse prevents refresh when data changes are anomalous. Both gate refresh but for different reasons. A ST can have both: watermark alignment must pass and fuse must be intact.
PLAN_DIAMOND_DEPENDENCY_CONSISTENCY.md A blown fuse on any member of a diamond atomic group blocks the entire group (consistent with all-or-nothing semantics).
PLAN_CROSS_SOURCE_SNAPSHOT_CONSISTENCY.md Orthogonal. Snapshot consistency concerns which data is visible; the fuse concerns whether to apply changes at all.
Adaptive DIFF→FULL fallback (refresh.rs) The fuse is a stricter safety layer above the adaptive fallback. Both examine change volume but at different thresholds with different effects (block vs. switch strategy). Can share the change buffer count query.
Auto-suspension (scheduler.rs) Auto-suspension handles repeated failures (errors). The fuse handles anomalous successes (unexpected volumes). Complementary — different triggers, same protective intent.
PLAN_HYBRID_CDC.md The fuse works regardless of CDC mode (trigger or WAL). In WAL mode, the “change buffer count” may be derived from decoded WAL records rather than trigger-written rows, but the blow logic is identical.

10. Prior Art

  1. Electrical Fuses — The direct metaphor. A fuse blows when current exceeds its rating, protecting downstream circuits from damage. Must be replaced manually. Our fuse blows when change volume exceeds its rating, protecting downstream stream tables from anomalous data.

  2. Netflix Hystrix / Resilience4j — Circuit breaker pattern for microservice calls. Trip on failure rate, reset after timeout or manual intervention. Similar state machine model, though the “failure” in our case is anomalous data volume rather than service errors. We chose the “fuse” name to emphasize that manual reset is always required.

  3. Apache Flink Backpressure — Flink operators slow down when downstream can’t keep up. Not a fuse per se, but the same principle of protecting the system from overwhelming data rates.

  4. Great Expectations / Soda — Data quality tools that validate data against expectations (row counts, distributions, null rates) before pipeline stages proceed. The fuse’s adaptive baseline is a simplified version of statistical expectation checking, applied inline in the refresh pipeline rather than as a separate validation step.

  5. dbt source freshness + alerting — dbt can test whether source data is stale, but doesn’t prevent model execution based on anomalous volume. The fuse goes further by actually blocking propagation.

  6. PostgreSQL statement_timeout / lock_timeout — Built-in safety mechanisms that abort operations exceeding time bounds. The fuse applies a similar philosophy to data volume bounds.

  7. Kafka Consumer Lag Monitoring — Kafka consumer groups monitor lag (difference between latest offset and consumer position). An anomalous spike in lag can indicate producer issues. The fuse’s change buffer monitoring is analogous: a sudden spike in unconsumed changes indicates something unusual at the source.


11. Sketch Implementation Steps

Preliminary — subject to change based on the open design decisions.

Step 1 — Blow logic as pure function

/// Determines whether the fuse should blow.
///
/// Returns `Some(reason)` if the fuse should blow, `None` otherwise.
pub fn should_blow(
    mode: FuseMode,
    change_count: i64,
    baseline_mean: Option<f64>,
    baseline_stddev: Option<f64>,
    ceiling: Option<i64>,
    sensitivity: f64,
) -> Option<String> { ... }

Testable without a database. Unit tests cover all mode combinations, cold-start (no baseline), edge cases (zero stddev, etc.).

Step 2 — Catalog: fuse configuration and state

Add columns or table for mode, state, blown_at, blow_reason, ceiling, sensitivity, window_size. Migration SQL.

Step 3 — alter_stream_table() extension

Accept fuse, fuse_ceiling, fuse_sensitivity parameters.

Step 4 — Baseline computation helper

Query pgt_refresh_history for the last $N$ successful DIFFERENTIAL refreshes of a given ST, compute mean and stddev of delta_row_count.

Step 5 — Scheduler integration

Pre-check in the per-ST refresh loop. Blow and skip if needed. Share change buffer count query with the adaptive fallback to avoid duplicate SPI calls.

Step 6 — reset_fuse() SQL function

Set state to 'intact'. Handle the action parameter ('apply', 'reinitialize', 'skip_changes').

Step 7 — fuse_status() introspection function

Return current state, baseline, last delta, ceiling, sensitivity for all STs.

Step 8 — NOTIFY alert on blow

Emit fuse_blown event on pg_trickle_alert.

Step 9 — Tests

Test Type Proves
test_fuse_blow_fixed_threshold Unit Fixed mode blows at ceiling
test_fuse_blow_adaptive Unit Adaptive blows at μ + kσ
test_fuse_no_blow_normal_volume Unit Normal delta does not blow
test_fuse_cold_start_no_baseline Unit Behavior with < N historical refreshes
test_fuse_combined_ceiling_overrides Unit Hard ceiling blows even when adaptive wouldn’t
test_fuse_blocks_refresh_when_blown E2E Blown ST skips refresh on tick
test_fuse_reset_apply E2E Reset with default action resumes refresh
test_fuse_reset_reinitialize E2E Reset triggers FULL reinitialize
test_fuse_reset_skip_changes E2E Reset advances frontier, skips buffered changes
test_fuse_alert_emitted E2E NOTIFY fires with correct JSON on blow
test_fuse_diamond_group_blocked E2E Blown member blocks entire atomic group
test_fuse_downstream_sees_no_changes E2E Downstream STs skip when upstream fuse is blown

Step 10 — Documentation

  • docs/SQL_REFERENCE.md: fuse parameter, reset_fuse(), fuse_status().
  • docs/CONFIGURATION.md: GUC if added.
  • CHANGELOG.md.