Plain-language companion: v0.11.0.md

v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness

Status: Released 2026-03-26. See CHANGELOG.md §0.11.0 for the full feature list.

Highlights: 34× lower latency via event-driven scheduler wake · incremental ST-to-ST refresh chains · declaratively partitioned stream tables (100× I/O reduction) · ready-to-use Prometheus + Grafana monitoring stack · FUSE circuit breaker · VARBIT changed-column bitmask (no more 63-column cap) · per-database worker quotas · DAG scheduling performance improvements (fused chains, adaptive polling, amplification detection) · TPC-H correctness gate in CI · safer production defaults.

Partitioned Stream Tables — Storage (A-1)

In plain terms: A 10M-row stream table partitioned into 100 ranges means only the 2–3 partitions that actually received changes are touched by MERGE — reducing the MERGE scan from 10M rows to ~100K. The partition key must be a user-visible column and the refresh path must inject a verified range predicate.

Item Description Effort Ref
A1-1 DDL: CREATE STREAM TABLE … PARTITION BY declaration; catalog column for partition key 1–2 wk PLAN_NEW_STUFF.md §A-1
A1-2 Delta inspection: extract min/max of partition key from delta CTE per scheduler tick 1 wk PLAN_NEW_STUFF.md §A-1
A1-3 MERGE rewrite: inject validated partition-key range predicate or issue per-partition MERGEs via Rust loop 2–3 wk PLAN_NEW_STUFF.md §A-1
A1-4 E2E benchmarks: 10M-row partitioned ST, 0.1% change rate concentrated in 2–3 partitions 1 wk PLAN_NEW_STUFF.md §A-1

⚠️ MERGE joins on __pgt_row_id (a content hash unrelated to the partition key) — partition pruning will not activate automatically. A predicate injection step is mandatory. See PLAN_NEW_STUFF.md §A-1 risk analysis before starting.

Retraction consideration (A-1): The 5–7 week effort estimate is optimistic. The core assumption — that partition pruning can be activated via a WHERE partition_key BETWEEN ? AND ? predicate — requires the partition key to be a tracked catalog column (not currently the case) and a verified range derivation from the delta. The alternative (per-partition MERGE loop in Rust) is architecturally sound but requires significant catalog and refresh-path changes. A design spike (2–4 days) producing a written implementation plan must be completed before A1-1 is started. The milestone is at P3 / Very High risk and should not block the 1.0 release if the design spike reveals additional complexity.

Partitioned stream tables subtotal: ~5–7 weeks

Multi-Database Scheduler Isolation (C-3)

Item Description Effort Ref
C3-1 Per-database worker quotas (pg_trickle.per_database_worker_quota); priority ordering (IMMEDIATE > Hot > Warm > Cold); burst capacity up to 150% when other DBs are under budget ✅ Done in v0.11.0 Phase 11 — compute_per_db_quota() helper with burst threshold at 80% cluster utilisation; sort_ready_queue_by_priority() dispatches ImmediateClosure first; 7 unit tests. src/scheduler.rs

Multi-DB isolation subtotal: ✅ Complete

Prometheus & Grafana Observability

In plain terms: Most teams already run Prometheus and Grafana to monitor their databases. This ships ready-to-use configuration files — no custom code, no extension changes — that plug into the standard postgres_exporter and light up a Grafana dashboard showing refresh latency, staleness, error rates, CDC lag, and per-stream-table detail. Also includes Prometheus alerting rules so you get paged when a stream table goes stale or starts error-looping. A Docker Compose file lets you try the full observability stack with a single docker compose up.

Zero-code monitoring integration. All config files live in a new monitoring/ directory in the main repo (or a separate pgtrickle-monitoring repo). Queries use existing views (pg_stat_stream_tables, check_cdc_health(), quick_health).

Item Description Effort Ref
OBS-1 Prometheus metrics out of the box. ✅ Done in v0.11.0 Phase 3 — monitoring/prometheus/pg_trickle_queries.yml exports 14 metrics (per-table refresh stats, health summary, CDC buffer sizes, status counts, recent error rate) via postgres_exporter. monitoring/prometheus/pg_trickle_queries.yml
OBS-2 Get paged when things go wrong. ✅ Done in v0.11.0 Phase 3 — monitoring/prometheus/alerts.yml has 8 alerting rules: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration, cluster WARNING/CRITICAL. monitoring/prometheus/alerts.yml
OBS-3 See everything at a glance. ✅ Done in v0.11.0 Phase 3 — monitoring/grafana/dashboards/pg_trickle_overview.json has 6 sections: cluster overview stat panels, refresh performance time-series, staleness heatmap, CDC health graphs, per-table drill-down table with schema/table variable filters. monitoring/grafana/dashboards/pg_trickle_overview.json
OBS-4 Try it all in one command. ✅ Done in v0.11.0 Phase 3 — monitoring/docker-compose.yml spins up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with pre-wired config and demo seed data (monitoring/init/01_demo.sql). docker compose up → Grafana at :3000. monitoring/docker-compose.yml

Observability subtotal: ~12 hours

Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md)

These four changes flip conservative defaults to the behavior that is safe and correct in production. All underlying features are implemented and tested; only the default values change. Each keeps the original GUC so operators can revert if needed.

Item Description Effort Ref
DEF-1 Flip parallel_refresh_mode default to 'on'. ✅ Done in v0.11.0 Phase 1 — default flipped; normalize_parallel_refresh_mode maps None/unknown → On; unit test renamed to defaults_to_on. REPORT_OVERALL_STATUS.md §R1
DEF-2 Flip auto_backoff default to true. ✅ Done in v0.10.0 — default flipped to true; trigger threshold raised to 95%, cap reduced to 8×, log level raised to WARNING. CONFIGURATION.md updated. 1–2h REPORT_OVERALL_STATUS.md §R10
DEF-3 SemiJoin delta-key pre-filter (O-1). ✅ Verified already implemented in v0.11.0 Phase 2 — left_snapshot_filtered pre-filter with WHERE left_key IN (SELECT DISTINCT right_key FROM delta) was already present in semi_join.rs. src/dvm/operators/semi_join.rs
DEF-4 Increase invalidation ring capacity from 32 to 128 slots. ✅ Done in v0.11.0 Phase 1 — INVALIDATION_RING_CAPACITY raised to 128 in shmem.rs. REPORT_OVERALL_STATUS.md §R9
DEF-5 Flip block_source_ddl default to true. ✅ Done in v0.11.0 Phase 1 — default flipped to true; both error messages in hooks.rs include step-by-step escape-hatch procedure. REPORT_OVERALL_STATUS.md §R12

Default tuning subtotal: ~14–21 hours

Safety & Resilience Hardening (Must-Ship)

In plain terms: The background worker should never silently hang or leave a stream table in an undefined state when an internal operation fails. These items replace panic!/unwrap() in code paths reachable from the background worker with structured errors and graceful recovery.

Item Description Effort Ref
SAF-1 Replace worker-path panics with structured errors. ✅ Done in v0.11.0 Phase 1 — full audit of scheduler.rs, refresh.rs, hooks.rs: no panic!/unwrap() outside #[cfg(test)]. check_skip_needed now logs WARNING on SPI error with table name and error details. Audit finding documented in comment. src/scheduler.rs
SAF-2 Failure-injection E2E test. ✅ Done in v0.11.0 Phase 2 — two E2E tests in tests/e2e_safety_tests.rs: (1) column drop triggers UpstreamSchemaChanged, verifies scheduler stays alive and other STs continue; (2) source table drop, same verification. tests/e2e_safety_tests.rs

Safety hardening subtotal: ~7–12 hours

Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15)

In plain terms: Six self-contained improvements identified in the deep gap analysis. Each takes under a day and substantially reduces silent failure modes, operator confusion, and diagnostic friction.

Quick Fixes (< 1 hour each)

Item Description Effort Ref
QF-1 Fix unguarded debug println!. ✅ Done in v0.11.0 Phase 1 — println! replaced with pgrx::log!() guarded by new pg_trickle.log_merge_sql GUC (default off). src/refresh.rs
QF-2 Upgrade AUTO mode downgrade log level. ✅ Done in v0.11.0 Phase 1 — four AUTO→FULL downgrade paths in api.rs raised from pgrx::info!() to pgrx::warning!(). plans/performance/REPORT_OVERALL_STATUS.md §12
QF-3 Warn when append_only auto-reverts. ✅ Verified already implemented — pgrx::warning!() + emit_alert(AppendOnlyReverted) already present in refresh.rs. plans/performance/REPORT_OVERALL_STATUS.md §15
QF-4 Document parser unwrap() invariants. ✅ Done in v0.11.0 Phase 1 — // INVARIANT: comments added at four unwrap() sites in dvm/parser.rs (after is_empty() guard, len()==1 guards, and non-empty Err return). src/dvm/parser.rs

Quick-fix subtotal: ~3–4 hours

Effective Refresh Mode Tracking (G12-ERM)

In plain terms: When a stream table is configured as AUTO, operators currently have no way to discover which mode is actually being used at runtime without reading warning logs. Storing the resolved mode in the catalog and exposing a diagnostic function closes this observability gap.

Item Description Effort Ref
G12-ERM-1 Add effective_refresh_mode column to pgt_stream_tables. ✅ Done in v0.11.0 Phase 2 — column added; scheduler writes actual mode (FULL/DIFFERENTIAL/APPEND_ONLY/TOP_K/NO_DATA) via thread-local tracking; upgrade SQL pg_trickle--0.10.0--0.11.0.sql created. src/catalog.rs
G12-ERM-2 Add explain_refresh_mode(name TEXT) SQL function. ✅ Done in v0.11.0 Phase 2 — pgtrickle.explain_refresh_mode() returns configured mode, effective mode, and downgrade reason. src/api.rs

Effective refresh mode subtotal: ~4–7 hours

Correctness Guards (G12-2, G12-AGG)

Item Description Effort Ref
G12-2 TopK runtime validation. ✅ Done in v0.11.0 Phase 4 — validate_topk_metadata() re-parses the reconstructed full query on each TopK refresh; validate_topk_metadata_fields() validates stored fields (pure logic, unit-testable). Falls back to FULL + WARNING on mismatch. 7 unit tests. src/refresh.rs
G12-AGG Group-rescan aggregate warning. ✅ Done in v0.11.0 Phase 4 — classify_agg_strategy() classifies each aggregate as ALGEBRAIC_INVERTIBLE / ALGEBRAIC_VIA_AUX / SEMI_ALGEBRAIC / GROUP_RESCAN. Warning emitted at create_stream_table time for DIFFERENTIAL + group-rescan aggs. Strategy exposed in explain_st() as aggregate_strategies JSON. 18 unit tests. src/dvm/parser.rs

Correctness guards subtotal: ✅ Complete

Parameter & Error Hardening (G15-PV, G13-EH)

Item Description Effort Ref
G15-PV Validate incompatible parameter combinations. ✅ Done in v0.11.0 Phase 2 — (a) cdc_mode='wal' + refresh_mode='IMMEDIATE' rejection was already present; (b) diamond_schedule_policy='slowest' + diamond_consistency='none' now rejected in create_stream_table_impl and alter_stream_table_impl with structured error. src/api.rs
G13-EH Structured error HINT/DETAIL fields. ✅ Done in v0.11.0 Phase 2 — raise_error_with_context() helper in api.rs uses ErrorReport::new().set_detail().set_hint() for UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, and QueryParseError; all 8 API-boundary error sites updated. src/api.rs

Parameter & error hardening subtotal: ~6–12 hours

Testing: EC-01 Boundary Regression (G17-EC01B-NEG)

Item Description Effort Ref
G17-EC01B-NEG Add a negative regression test asserting that ≥3-scan join right subtrees currently fall back to FULL refresh. ✅ Done in v0.11.0 Phase 4 — 4 unit tests in join_common.rs covering 3-way join, 4-way join, right-subtree ≥3 scans, and 2-scan boundary. // TODO: Remove when EC01B-1/EC01B-2 fixed in v0.12.0 src/dvm/operators/join_common.rs

EC-01 boundary regression subtotal: ✅ Complete

Documentation Quick Wins (G16-GS, G16-SM, G16-MQR, G15-GUC)

Item Description Effort Ref
G16-GS Restructure GETTING_STARTED.md with progressive complexity. Five chapters: (1) Hello World — single-table ST with no join; (2) Multi-table join; (3) Scheduling & backpressure; (4) Monitoring — 5 key functions; (5) Advanced — FUSE, wide bitmask, partitions. Remove the current flat wall-of-SQL structure. ✅ Done in v0.11.0 Phase 11 — 5-chapter structure implemented; Chapter 1 Hello World example added; Chapter 5 Advanced Topics adds inline FUSE, partitioning, IMMEDIATE, and multi-tenant quota examples. docs/GETTING_STARTED.md
G16-SM SQL/mode operator support matrix. ✅ Done — 60+ row operator support matrix added to docs/DVM_OPERATORS.md covering all operators × FULL/DIFFERENTIAL/IMMEDIATE modes with caveat footnotes. docs/DVM_OPERATORS.md
G16-MQR Monitoring quick reference. ✅ Done — Monitoring Quick Reference section added to docs/GETTING_STARTED.md with pgt_status(), health_check(), change_buffer_sizes(), dependency_tree(), fuse_status(), Prometheus/Grafana stack, key metrics table, and alert summary. docs/GETTING_STARTED.md
G15-GUC GUC interaction matrix. ✅ Done — GUC Interaction Matrix (14 interaction pairs) and three named Tuning Profiles (Low-Latency, High-Throughput, Resource-Constrained) added to docs/CONFIGURATION.md. docs/CONFIGURATION.md

Documentation subtotal: ~2–3 days

Correctness quick-wins & documentation subtotal: ~1–2 days code + ~2–3 days docs

Should-Ship Additions

Wider Changed-Column Bitmask (>63 columns)

In plain terms: Stream tables built on source tables with more than 63 columns fall back silently to tracking every column on every UPDATE, losing all CDC selectivity. Extending the changed_cols field from a BIGINT to a BYTEA vector removes this cliff without breaking existing deployments.

Item Description Effort Ref
WB-1 Extend the CDC trigger changed_cols column from BIGINT to BYTEA; update bitmask encoding/decoding in cdc.rs; add schema migration for existing change buffer tables (tables with <64 columns are unaffected at the data level). 1–2 wk REPORT_OVERALL_STATUS.md §R13
WB-2 E2E test: wide (>63 column) source table; verify only referenced columns trigger delta propagation; benchmark UPDATE selectivity before/after. 2–4h tests/e2e_cdc_tests.rs

Wider bitmask subtotal: ~1–2 weeks + ~4h testing

Fuse — Anomalous Change Detection

In plain terms: A circuit breaker that stops a stream table from processing an unexpectedly large batch of changes (runaway script, mass delete, data migration) without operator review. A blown fuse halts refresh and emits a pgtrickle_alert NOTIFY; reset_fuse() resumes with a chosen recovery action (apply, reinitialize, or skip_changes).

Item Description Effort Ref
FUSE-1 Catalog: fuse state columns on pgt_stream_tables (fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity, blown_at, blow_reason) 1–2h PLAN_FUSE.md
FUSE-2 alter_stream_table() new params: fuse, fuse_ceiling, fuse_sensitivity 1h PLAN_FUSE.md
FUSE-3 reset_fuse(name, action => 'apply'|'reinitialize'|'skip_changes') SQL function 1h PLAN_FUSE.md
FUSE-4 fuse_status() introspection function 1h PLAN_FUSE.md
FUSE-5 Scheduler pre-check: count change buffer rows; evaluate threshold; blow fuse + NOTIFY if exceeded 2–3h PLAN_FUSE.md
FUSE-6 E2E tests: normal baseline, spike → blow, reset (apply/reinitialize/skip_changes), diamond/DAG interaction 4–6h PLAN_FUSE.md

Fuse subtotal: ~10–14 hours — ✅ Complete

External Correctness Gate (TS1 or TS2)

In plain terms: Run an independent public query corpus through pg_trickle’s DIFFERENTIAL mode and assert the results match a vanilla PostgreSQL execution. This catches blind spots that the extension’s own test suite cannot, and provides an objective correctness baseline before v1.0.

Item Description Effort Ref
TS1 sqllogictest suite. Run the PostgreSQL sqllogic suite through pg_trickle DIFFERENTIAL mode; gate CI on zero correctness mismatches. Preferred choice: broadest query coverage. 2–3d PLAN_TESTING_GAPS.md §J
TS2 JOB (Join Order Benchmark). Correctness baseline and refresh latency profiling on realistic multi-join analytical queries. Alternative if sqllogictest setup is too costly. 1–2d PLAN_TESTING_GAPS.md §J

Deliver one of TS1 or TS2; whichever is completed first meets the exit criterion.

External correctness gate subtotal: ~1–3 days

Differential ST-to-ST Refresh (✅ Done)

In plain terms: When stream table B’s defining query reads from stream table A, pg_trickle currently forces a FULL refresh of B every time A updates — re-executing B’s entire query even when only a handful of rows changed. This feature gives ST-to-ST dependencies the same CDC change buffer that base tables already have, so B refreshes differentially (applying only the delta). Crucially, even when A itself does a FULL refresh, a pre/post snapshot diff is captured so B still receives a small I/D delta rather than cascading FULL through the chain.

Item Description Status Ref
ST-ST-1 Change buffer infrastructure. create_st_change_buffer_table() / drop_st_change_buffer_table() in cdc.rs; lifecycle hooks in api.rs; idempotent ensure_st_change_buffer() ✅ Done PLAN_ST_TO_ST.md §Phase 1
ST-ST-2 Delta capture — DIFFERENTIAL path. Force explicit DML when ST has downstream consumers; capture delta from __pgt_delta_{id} to changes_pgt_{id} ✅ Done PLAN_ST_TO_ST.md §Phase 2
ST-ST-3 Delta capture — FULL path. Pre/post snapshot diff writes I/D pairs to changes_pgt_{id}; eliminates cascading FULL ✅ Done PLAN_ST_TO_ST.md §7
ST-ST-4 DVM scan operator for ST sources. Read from changes_pgt_{id}; pgt_-prefixed LSN tokens; extended frontier and placeholder resolver ✅ Done PLAN_ST_TO_ST.md §Phase 3
ST-ST-5 Scheduler integration. Buffer-based change detection in has_stream_table_source_changes(); removed FULL override; frontier augmented with ST source positions ✅ Done PLAN_ST_TO_ST.md §Phase 4
ST-ST-6 Cleanup & lifecycle. cleanup_st_change_buffers_by_frontier() for ST buffers; removed prewarm skip for ST sources; ST buffer cleanup in both differential and full refresh paths ✅ Done PLAN_ST_TO_ST.md §Phase 5–6

ST-to-ST differential subtotal: ~4.5–6.5 weeks

Adaptive/Event-Driven Scheduler Wake (Must-Ship)

In plain terms: The scheduler currently wakes on a fixed 1-second timer even when nothing has changed. This adds event-driven wake: CDC triggers notify the scheduler immediately when changes arrive. Median end-to-end latency drops from ~515 ms to ~15 ms for low-volume workloads — a 34× improvement. This is a must-ship item because low latency is a primary project goal.

Item Description Effort Ref
WAKE-1 Event-driven scheduler wake. ✅ Done in v0.11.0 Phase 7 — CDC triggers emit pg_notify('pgtrickle_wake', '') after each change buffer INSERT; scheduler issues LISTEN pgtrickle_wake at startup; 10 ms debounce coalesces rapid notifications; poll fallback preserved. New GUCs: event_driven_wake (default true), wake_debounce_ms (default 10). E2E tests in tests/e2e_wake_tests.rs. REPORT_OVERALL_STATUS.md §R16

Event-driven wake subtotal: ✅ Complete

Stretch Goals (if capacity allows after Must-Ship)

Item Description Effort Ref
STRETCH-1 Partitioned stream tables — design spike only. ✅ Done in v0.11.0 Partitioning Spike — RFC written (PLAN_PARTITIONING_SPIKE.md), go/no-go decision: Go. A1-1 implemented (catalog column, API parameter, validation). 2–4d PLAN_PARTITIONING_SPIKE.md
A1-1 DDL: CREATE STREAM TABLE … PARTITION BY; st_partition_key catalog column. ✅ Done — partition_by parameter added to all three create_stream_table* functions; st_partition_key TEXT column in catalog; validate_partition_key() validates column exists in output; build_create_table_sql emits PARTITION BY RANGE (key); setup_storage_table creates default catch-all partition and non-unique __pgt_row_id index. 1–2 wk PLAN_PARTITIONING_SPIKE.md
A1-2 Delta min/max inspection. ✅ Done — extract_partition_range() in refresh.rs runs SELECT MIN/MAX(key)::text on the resolved delta SQL; returns None on empty delta (MERGE skipped). 1 wk PLAN_PARTITIONING_SPIKE.md §8
A1-3 MERGE rewrite. ✅ Done — inject_partition_predicate() replaces __PGT_PART_PRED__ placeholder in MERGE ON clause with AND st."key" BETWEEN 'min' AND 'max'; CachedMergeTemplate stores delta_sql_template; D-2 prepared statements disabled for partitioned STs. 2–3 wk PLAN_PARTITIONING_SPIKE.md §8
A1-4 E2E benchmarks: 10M-row partitioned ST, 0.1%/0.2%/100% change rate scenarios; EXPLAIN (ANALYZE, BUFFERS) partition-scan verification. ✅ Done — 7 E2E tests added to tests/e2e_partition_tests.rs covering: initial populate, differential inserts, updates/deletes, empty-delta fast path, EXPLAIN plan verification, invalid partition key rejection; added to light-E2E allowlist. 1 wk PLAN_PARTITIONING_SPIKE.md §9

Stretch subtotal: STRETCH-1 + A1-1 + A1-2 + A1-3 + A1-4 ✅ All complete

DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)

In plain terms: Now that ST-to-ST differential refresh eliminates the “every hop is FULL” bottleneck, the next performance frontier is reducing per-hop overhead and exploiting DAG structure more aggressively. These items target the scheduling and dispatch layer — not the DVM engine — and collectively can reduce end-to-end propagation latency by 30–50% for heterogeneous DAGs.

Item Description Effort Ref
DAG-1 Intra-tick pipelining. Within a single scheduler tick, begin processing a downstream ST as soon as all its specific upstream dependencies have completed — not when the entire topological level finishes. Requires per-ST completion tracking in the parallel dispatch loop and immediate enqueuing of newly-ready STs. Expected 30–50% latency reduction for DAGs with mixed-cost levels. ✅ Done — Already achieved by Phase 4’s parallel dispatch architecture: per-dependency remaining_upstreams tracking with immediate downstream readiness propagation. No level barrier exists. 3 validation tests. 2–3 wk PLAN_DAG_PERFORMANCE.md §8.1
DAG-2 Adaptive poll interval. Replace the fixed 200 ms parallel dispatch poll with exponential backoff (20 ms → 200 ms), resetting on worker completion. Makes parallel mode competitive with CALCULATED for cheap refreshes ($T_r \approx 10\text{ms}$). Alternative: WaitLatch with shared-memory completion flags. ✅ Done — compute_adaptive_poll_ms() pure-logic helper with exponential backoff (20ms → 200ms); ParallelDispatchState tracks adaptive_poll_ms + completions_this_tick; resets to 20ms on worker completion; 8 unit tests. 1–2 wk PLAN_DAG_PERFORMANCE.md §8.2
DAG-3 Delta amplification detection. Track input→output delta ratio per hop via pgt_refresh_history. When a join ST amplifies delta beyond a configurable threshold (e.g., output > 100× input), emit a performance WARNING and optionally fall back to FULL for that hop. Expose amplification metrics in explain_st(). ✅ Done — pg_trickle.delta_amplification_threshold GUC (default 100×); compute_amplification_ratio + should_warn_amplification pure-logic helpers; WARNING emitted after MERGE with ratio, counts, and tuning hint; explain_st() exposes amplification_stats JSON from last 20 DIFFERENTIAL refreshes; 15 unit tests. 3–5d PLAN_DAG_PERFORMANCE.md §8.4
DAG-4 ST buffer bypass for single-consumer CALCULATED chains. For ST dependencies with exactly one downstream consumer refreshing in the same tick, pass the delta in-memory instead of writing/reading from the changes_pgt_ buffer table. Eliminates 2× SPI DML per hop (~20 ms savings per hop for 10K-row deltas). ✅ Done — FusedChain execution unit kind; find_fusable_chains() pure-logic detection; capture_delta_to_bypass_table() writes to temp table; DiffContext.st_bypass_tables threads bypass through DVM scan; delta SQL cache bypassed when active; 11+4 unit tests. 3–4 wk PLAN_DAG_PERFORMANCE.md §8.3
DAG-5 ST buffer batch coalescing. Apply net-effect computation to ST change buffers before downstream reads — cancel INSERT/DELETE pairs for the same __pgt_row_id that accumulate between reads during rapid-fire upstream refreshes. Adapts existing compute_net_effect() logic to the ST buffer schema. ✅ Done — compact_st_change_buffer() with build_st_compact_sql() pure-logic helper; advisory lock namespace 0x5047_5500; integrated in execute_differential_refresh() after C-4 base-table compaction; 9 unit tests. 1–2 wk PLAN_DAG_PERFORMANCE.md §8.5

DAG refresh performance subtotal: ~8–12 weeks

v0.11.0 total: ~7–10 weeks (partitioning + isolation) + ~12h observability + ~14–21h default tuning + ~7–12h safety hardening + ~2–4 weeks should-ship (bitmask + fuse + external corpus) + ~4.5–6.5 weeks ST-to-ST differential + ~2–3 weeks event-driven wake + ~1–2 days correctness quick-wins + ~2–3 days documentation + ~8–12 weeks DAG performance

Exit criteria: ✅ All met. Released 2026-03-26. - [x] Declaratively partitioned stream tables accepted; partition key tracked in catalog — ✅ Done in v0.11.0 Partitioning Spike (STRETCH-1 RFC + A1-1) - [x] Partitioned storage table created with PARTITION BY RANGE + default catch-all partition — ✅ Done (A1-1 physical DDL) - [x] Partition-key range predicate injected into MERGE ON clause; empty-delta fast-path skips MERGE — ✅ Done (A1-2 + A1-3) - [x] Partition-scoped MERGE benchmark: 10M-row ST, 0.1% change rate (expect ~100× I/O reduction) — ✅ Done (A1-4 E2E tests) - [x] Per-database worker quotas enforced; burst reclaimed within 1 scheduler cycle — ✅ Done in v0.11.0 Phase 11 (pg_trickle.per_database_worker_quota GUC; burst to 150% at < 80% cluster load) - [x] Prometheus queries + alerting rules + Grafana dashboard shipped — ✅ Done in v0.11.0 Phase 3 (monitoring/ directory) - [x] DEF-1: parallel_refresh_mode default is 'on'; unit test updated — ✅ Done in v0.11.0 Phase 1 - [x] DEF-2: auto_backoff default is true; CONFIGURATION.md updated — ✅ Done in v0.10.0 - [x] DEF-3: SemiJoin delta-key pre-filter verified already implemented — ✅ Done in v0.11.0 Phase 2 (pre-existing in semi_join.rs) - [x] DEF-4: Invalidation ring capacity is 128 slots — ✅ Done in v0.11.0 Phase 1 - [x] DEF-5: block_source_ddl default is true; error message includes escape-hatch instructions — ✅ Done in v0.11.0 Phase 1 - [x] SAF-1: No panic!/unwrap() in background worker hot paths; check_skip_needed logs SPI errors — ✅ Done in v0.11.0 Phase 1 - [x] SAF-2: Failure-injection E2E tests in tests/e2e_safety_tests.rs — ✅ Done in v0.11.0 Phase 2 - [x] WB-1+2: Changed-column bitmask supports >63 columns (VARBIT); wide-table CDC selectivity E2E passes; schema migration tested — ✅ Done in v0.11.0 Phase 5 - [x] FUSE-1–6: Fuse blows on configurable change-count threshold; reset_fuse() recovers in all three action modes; diamond/DAG interaction tested — ✅ Done in v0.11.0 Phase 6 - [x] TS2: TPC-H-derived 5-query DIFFERENTIAL correctness gate passes with zero mismatches; gated in CI — ✅ Done in v0.11.0 Phase 9 - [x] QF-1–4: println! replaced with guarded pgrx::log!(); AUTO downgrades emit WARNING; append_only reversion verified already warns; parser invariant sites annotated — ✅ Done in v0.11.0 Phase 1 - [x] G12-ERM: effective_refresh_mode column present in pgt_stream_tables; explain_refresh_mode() returns configured mode, effective mode, downgrade reason — ✅ Done in v0.11.0 Phase 2 - [x] G12-2: TopK path validates assumptions at refresh time; triggers FULL fallback with WARNING on violation — ✅ Done in v0.11.0 Phase 4 - [x] G12-AGG: Group-rescan aggregate warning fires at create_stream_table for DIFFERENTIAL mode; strategy visible in explain_st() — ✅ Done in v0.11.0 Phase 4 - [x] G15-PV: Incompatible cdc_mode/refresh_mode and diamond_schedule_policy combinations rejected at creation time with structured HINT — ✅ Done in v0.11.0 Phase 2 - [x] G13-EH: UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, QueryParseError include DETAIL and HINT fields — ✅ Done in v0.11.0 Phase 2 - [x] G17-EC01B-NEG: Negative regression test documents ≥3-scan fall-back behavior; linked to v0.12.0 EC01B fix — ✅ Done in v0.11.0 Phase 4 - [x] G16-GS/SM/MQR/GUC: GETTING_STARTED restructured (5 chapters + Hello World + Advanced Topics); DVM_OPERATORS support matrix; monitoring quick reference; CONFIGURATION.md GUC matrix — ✅ Done in v0.11.0 Phase 11 - [x] ST-ST-1–6: All ST-to-ST dependencies refresh differentially when upstream has a change buffer; FULL refreshes on upstream produce pre/post I/D diff; no cascading FULL — ✅ Done in v0.11.0 Phase 8 - [x] WAKE-1: Event-driven scheduler wake; median latency ~15 ms (34× improvement); 10 ms debounce; poll fallback — ✅ Done in v0.11.0 Phase 7 - [x] DAG-1: Intra-tick pipelining confirmed in Phase 4 architecture — ✅ Done - [x] DAG-2: Adaptive poll interval (20 ms → 200 ms exponential backoff) — ✅ Done in v0.11.0 Phase 10 - [x] DAG-3: Delta amplification detection with pg_trickle.delta_amplification_threshold GUC — ✅ Done in v0.11.0 Phase 10 - [x] DAG-4: ST buffer bypass (FusedChain) for single-consumer CALCULATED chains — ✅ Done in v0.11.0 Phase 10 - [x] DAG-5: ST buffer batch coalescing cancels redundant I/D pairs — ✅ Done in v0.11.0 Phase 10 - [x] Extension upgrade path tested (0.10.0 → 0.11.0) — ✅ upgrade SQL in sql/pg_trickle--0.10.0--0.11.0.sql