Contents
-
- v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness
- Partitioned Stream Tables — Storage (A-1)
- Multi-Database Scheduler Isolation (C-3)
- Prometheus & Grafana Observability
- Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md)
- Safety & Resilience Hardening (Must-Ship)
- Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15)
- Should-Ship Additions
- Adaptive/Event-Driven Scheduler Wake (Must-Ship)
- Stretch Goals (if capacity allows after Must-Ship)
- DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)
- v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness
Plain-language companion: v0.11.0.md
v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness
Status: Released 2026-03-26. See CHANGELOG.md §0.11.0 for the full feature list.
Highlights: 34× lower latency via event-driven scheduler wake · incremental ST-to-ST refresh chains · declaratively partitioned stream tables (100× I/O reduction) · ready-to-use Prometheus + Grafana monitoring stack · FUSE circuit breaker · VARBIT changed-column bitmask (no more 63-column cap) · per-database worker quotas · DAG scheduling performance improvements (fused chains, adaptive polling, amplification detection) · TPC-H correctness gate in CI · safer production defaults.
Partitioned Stream Tables — Storage (A-1)
In plain terms: A 10M-row stream table partitioned into 100 ranges means only the 2–3 partitions that actually received changes are touched by MERGE — reducing the MERGE scan from 10M rows to ~100K. The partition key must be a user-visible column and the refresh path must inject a verified range predicate.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A1-1 | DDL: CREATE STREAM TABLE … PARTITION BY declaration; catalog column for partition key |
1–2 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-2 | Delta inspection: extract min/max of partition key from delta CTE per scheduler tick | 1 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-3 | MERGE rewrite: inject validated partition-key range predicate or issue per-partition MERGEs via Rust loop | 2–3 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-4 | E2E benchmarks: 10M-row partitioned ST, 0.1% change rate concentrated in 2–3 partitions | 1 wk | PLAN_NEW_STUFF.md §A-1 |
⚠️ MERGE joins on
__pgt_row_id(a content hash unrelated to the partition key) — partition pruning will not activate automatically. A predicate injection step is mandatory. See PLAN_NEW_STUFF.md §A-1 risk analysis before starting.Retraction consideration (A-1): The 5–7 week effort estimate is optimistic. The core assumption — that partition pruning can be activated via a
WHERE partition_key BETWEEN ? AND ?predicate — requires the partition key to be a tracked catalog column (not currently the case) and a verified range derivation from the delta. The alternative (per-partition MERGE loop in Rust) is architecturally sound but requires significant catalog and refresh-path changes. A design spike (2–4 days) producing a written implementation plan must be completed before A1-1 is started. The milestone is at P3 / Very High risk and should not block the 1.0 release if the design spike reveals additional complexity.Partitioned stream tables subtotal: ~5–7 weeks
Multi-Database Scheduler Isolation (C-3)
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
pg_trickle.per_database_worker_quota); priority ordering (IMMEDIATE > Hot > Warm > Cold); burst capacity up to 150% when other DBs are under budgetcompute_per_db_quota() helper with burst threshold at 80% cluster utilisation; sort_ready_queue_by_priority() dispatches ImmediateClosure first; 7 unit tests. |
— | src/scheduler.rs |
Multi-DB isolation subtotal: ✅ Complete
Prometheus & Grafana Observability
In plain terms: Most teams already run Prometheus and Grafana to monitor their databases. This ships ready-to-use configuration files — no custom code, no extension changes — that plug into the standard
postgres_exporterand light up a Grafana dashboard showing refresh latency, staleness, error rates, CDC lag, and per-stream-table detail. Also includes Prometheus alerting rules so you get paged when a stream table goes stale or starts error-looping. A Docker Compose file lets you try the full observability stack with a singledocker compose up.
Zero-code monitoring integration. All config files live in a new
monitoring/ directory in the main repo (or a separate
pgtrickle-monitoring repo). Queries use existing views
(pg_stat_stream_tables, check_cdc_health(), quick_health).
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
monitoring/prometheus/pg_trickle_queries.yml exports 14 metrics (per-table refresh stats, health summary, CDC buffer sizes, status counts, recent error rate) via postgres_exporter. |
— | monitoring/prometheus/pg_trickle_queries.yml |
| |
monitoring/prometheus/alerts.yml has 8 alerting rules: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration, cluster WARNING/CRITICAL. |
— | monitoring/prometheus/alerts.yml |
| |
monitoring/grafana/dashboards/pg_trickle_overview.json has 6 sections: cluster overview stat panels, refresh performance time-series, staleness heatmap, CDC health graphs, per-table drill-down table with schema/table variable filters. |
— | monitoring/grafana/dashboards/pg_trickle_overview.json |
| |
monitoring/docker-compose.yml spins up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with pre-wired config and demo seed data (monitoring/init/01_demo.sql). docker compose up → Grafana at :3000. |
— | monitoring/docker-compose.yml |
Observability subtotal: ~12 hours ✅
Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md)
These four changes flip conservative defaults to the behavior that is safe and correct in production. All underlying features are implemented and tested; only the default values change. Each keeps the original GUC so operators can revert if needed.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
parallel_refresh_mode default to 'on'.normalize_parallel_refresh_mode maps None/unknown → On; unit test renamed to defaults_to_on. |
— | REPORT_OVERALL_STATUS.md §R1 |
| DEF-2 | auto_backoff default to true.true; trigger threshold raised to 95%, cap reduced to 8×, log level raised to WARNING. CONFIGURATION.md updated. |
1–2h | REPORT_OVERALL_STATUS.md §R10 |
| |
left_snapshot_filtered pre-filter with WHERE left_key IN (SELECT DISTINCT right_key FROM delta) was already present in semi_join.rs. |
— | src/dvm/operators/semi_join.rs |
| |
INVALIDATION_RING_CAPACITY raised to 128 in shmem.rs. |
— | REPORT_OVERALL_STATUS.md §R9 |
| |
block_source_ddl default to true.true; both error messages in hooks.rs include step-by-step escape-hatch procedure. |
— | REPORT_OVERALL_STATUS.md §R12 |
Default tuning subtotal: ~14–21 hours
Safety & Resilience Hardening (Must-Ship)
In plain terms: The background worker should never silently hang or leave a stream table in an undefined state when an internal operation fails. These items replace
panic!/unwrap()in code paths reachable from the background worker with structured errors and graceful recovery.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
scheduler.rs, refresh.rs, hooks.rs: no panic!/unwrap() outside #[cfg(test)]. check_skip_needed now logs WARNING on SPI error with table name and error details. Audit finding documented in comment. |
— | src/scheduler.rs |
| |
tests/e2e_safety_tests.rs: (1) column drop triggers UpstreamSchemaChanged, verifies scheduler stays alive and other STs continue; (2) source table drop, same verification. |
— | tests/e2e_safety_tests.rs |
Safety hardening subtotal: ~7–12 hours
Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15)
In plain terms: Six self-contained improvements identified in the deep gap analysis. Each takes under a day and substantially reduces silent failure modes, operator confusion, and diagnostic friction.
Quick Fixes (< 1 hour each)
| Item | Description | Effort | Ref |
|---|---|---|---|
| QF-1 | println!.println! replaced with pgrx::log!() guarded by new pg_trickle.log_merge_sql GUC (default off). |
— | src/refresh.rs |
| QF-2 | api.rs raised from pgrx::info!() to pgrx::warning!(). |
— | plans/performance/REPORT_OVERALL_STATUS.md §12 |
| QF-3 | append_only auto-reverts.pgrx::warning!() + emit_alert(AppendOnlyReverted) already present in refresh.rs. |
— | plans/performance/REPORT_OVERALL_STATUS.md §15 |
| QF-4 | unwrap() invariants.// INVARIANT: comments added at four unwrap() sites in dvm/parser.rs (after is_empty() guard, len()==1 guards, and non-empty Err return). |
— | src/dvm/parser.rs |
Quick-fix subtotal: ~3–4 hours
Effective Refresh Mode Tracking (G12-ERM)
In plain terms: When a stream table is configured as
AUTO, operators currently have no way to discover which mode is actually being used at runtime without reading warning logs. Storing the resolved mode in the catalog and exposing a diagnostic function closes this observability gap.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
effective_refresh_mode column to pgt_stream_tablespg_trickle--0.10.0--0.11.0.sql created. |
— | src/catalog.rs |
| |
explain_refresh_mode(name TEXT) SQL functionpgtrickle.explain_refresh_mode() returns configured mode, effective mode, and downgrade reason. |
— | src/api.rs |
Effective refresh mode subtotal: ~4–7 hours
Correctness Guards (G12-2, G12-AGG)
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
validate_topk_metadata() re-parses the reconstructed full query on each TopK refresh; validate_topk_metadata_fields() validates stored fields (pure logic, unit-testable). Falls back to FULL + WARNING on mismatch. 7 unit tests. |
— | src/refresh.rs |
| |
classify_agg_strategy() classifies each aggregate as ALGEBRAIC_INVERTIBLE / ALGEBRAIC_VIA_AUX / SEMI_ALGEBRAIC / GROUP_RESCAN. Warning emitted at create_stream_table time for DIFFERENTIAL + group-rescan aggs. Strategy exposed in explain_st() as aggregate_strategies JSON. 18 unit tests. |
— | src/dvm/parser.rs |
Correctness guards subtotal: ✅ Complete
Parameter & Error Hardening (G15-PV, G13-EH)
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
cdc_mode='wal' + refresh_mode='IMMEDIATE' rejection was already present; (b) diamond_schedule_policy='slowest' + diamond_consistency='none' now rejected in create_stream_table_impl and alter_stream_table_impl with structured error. |
— | src/api.rs |
| |
raise_error_with_context() helper in api.rs uses ErrorReport::new().set_detail().set_hint() for UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, and QueryParseError; all 8 API-boundary error sites updated. |
— | src/api.rs |
Parameter & error hardening subtotal: ~6–12 hours
Testing: EC-01 Boundary Regression (G17-EC01B-NEG)
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
join_common.rs covering 3-way join, 4-way join, right-subtree ≥3 scans, and 2-scan boundary. // TODO: Remove when EC01B-1/EC01B-2 fixed in v0.12.0 |
— | src/dvm/operators/join_common.rs |
EC-01 boundary regression subtotal: ✅ Complete
Documentation Quick Wins (G16-GS, G16-SM, G16-MQR, G15-GUC)
| Item | Description | Effort | Ref |
|---|---|---|---|
| G16-GS | Restructure GETTING_STARTED.md with progressive complexity. Five chapters: (1) Hello World — single-table ST with no join; (2) Multi-table join; (3) Scheduling & backpressure; (4) Monitoring — 5 key functions; (5) Advanced — FUSE, wide bitmask, partitions. Remove the current flat wall-of-SQL structure. ✅ Done in v0.11.0 Phase 11 — 5-chapter structure implemented; Chapter 1 Hello World example added; Chapter 5 Advanced Topics adds inline FUSE, partitioning, IMMEDIATE, and multi-tenant quota examples. |
— | docs/GETTING_STARTED.md |
| |
docs/DVM_OPERATORS.md covering all operators × FULL/DIFFERENTIAL/IMMEDIATE modes with caveat footnotes. |
— | docs/DVM_OPERATORS.md |
| |
docs/GETTING_STARTED.md with pgt_status(), health_check(), change_buffer_sizes(), dependency_tree(), fuse_status(), Prometheus/Grafana stack, key metrics table, and alert summary. |
— | docs/GETTING_STARTED.md |
| |
docs/CONFIGURATION.md. |
— | docs/CONFIGURATION.md |
Documentation subtotal: ~2–3 days
Correctness quick-wins & documentation subtotal: ~1–2 days code + ~2–3 days docs
Should-Ship Additions
Wider Changed-Column Bitmask (>63 columns)
In plain terms: Stream tables built on source tables with more than 63 columns fall back silently to tracking every column on every UPDATE, losing all CDC selectivity. Extending the
changed_colsfield from aBIGINTto aBYTEAvector removes this cliff without breaking existing deployments.
| Item | Description | Effort | Ref |
|---|---|---|---|
| WB-1 | Extend the CDC trigger changed_cols column from BIGINT to BYTEA; update bitmask encoding/decoding in cdc.rs; add schema migration for existing change buffer tables (tables with <64 columns are unaffected at the data level). |
1–2 wk | REPORT_OVERALL_STATUS.md §R13 |
| WB-2 | E2E test: wide (>63 column) source table; verify only referenced columns trigger delta propagation; benchmark UPDATE selectivity before/after. | 2–4h | tests/e2e_cdc_tests.rs |
Wider bitmask subtotal: ~1–2 weeks + ~4h testing
Fuse — Anomalous Change Detection
In plain terms: A circuit breaker that stops a stream table from processing an unexpectedly large batch of changes (runaway script, mass delete, data migration) without operator review. A blown fuse halts refresh and emits a
pgtrickle_alertNOTIFY;reset_fuse()resumes with a chosen recovery action (apply,reinitialize, orskip_changes).
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
pgt_stream_tables (fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity, blown_at, blow_reason) |
1–2h | PLAN_FUSE.md |
| |
alter_stream_table() new params: fuse, fuse_ceiling, fuse_sensitivity |
1h | PLAN_FUSE.md |
| |
reset_fuse(name, action => 'apply'|'reinitialize'|'skip_changes') SQL function |
1h | PLAN_FUSE.md |
| |
fuse_status() introspection function |
1h | PLAN_FUSE.md |
| |
|
2–3h | PLAN_FUSE.md |
| |
apply/reinitialize/skip_changes), diamond/DAG interaction |
4–6h | PLAN_FUSE.md |
Fuse subtotal: ~10–14 hours — ✅ Complete
External Correctness Gate (TS1 or TS2)
In plain terms: Run an independent public query corpus through pg_trickle’s DIFFERENTIAL mode and assert the results match a vanilla PostgreSQL execution. This catches blind spots that the extension’s own test suite cannot, and provides an objective correctness baseline before v1.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TS1 | sqllogictest suite. Run the PostgreSQL sqllogic suite through pg_trickle DIFFERENTIAL mode; gate CI on zero correctness mismatches. Preferred choice: broadest query coverage. | 2–3d | PLAN_TESTING_GAPS.md §J |
| TS2 | JOB (Join Order Benchmark). Correctness baseline and refresh latency profiling on realistic multi-join analytical queries. Alternative if sqllogictest setup is too costly. | 1–2d | PLAN_TESTING_GAPS.md §J |
Deliver one of TS1 or TS2; whichever is completed first meets the exit criterion.
External correctness gate subtotal: ~1–3 days
Differential ST-to-ST Refresh (✅ Done)
In plain terms: When stream table B’s defining query reads from stream table A, pg_trickle currently forces a FULL refresh of B every time A updates — re-executing B’s entire query even when only a handful of rows changed. This feature gives ST-to-ST dependencies the same CDC change buffer that base tables already have, so B refreshes differentially (applying only the delta). Crucially, even when A itself does a FULL refresh, a pre/post snapshot diff is captured so B still receives a small I/D delta rather than cascading FULL through the chain.
| Item | Description | Status | Ref |
|---|---|---|---|
| ST-ST-1 | Change buffer infrastructure. create_st_change_buffer_table() / drop_st_change_buffer_table() in cdc.rs; lifecycle hooks in api.rs; idempotent ensure_st_change_buffer() |
✅ Done | PLAN_ST_TO_ST.md §Phase 1 |
| ST-ST-2 | Delta capture — DIFFERENTIAL path. Force explicit DML when ST has downstream consumers; capture delta from __pgt_delta_{id} to changes_pgt_{id} |
✅ Done | PLAN_ST_TO_ST.md §Phase 2 |
| ST-ST-3 | Delta capture — FULL path. Pre/post snapshot diff writes I/D pairs to changes_pgt_{id}; eliminates cascading FULL |
✅ Done | PLAN_ST_TO_ST.md §7 |
| ST-ST-4 | DVM scan operator for ST sources. Read from changes_pgt_{id}; pgt_-prefixed LSN tokens; extended frontier and placeholder resolver |
✅ Done | PLAN_ST_TO_ST.md §Phase 3 |
| ST-ST-5 | Scheduler integration. Buffer-based change detection in has_stream_table_source_changes(); removed FULL override; frontier augmented with ST source positions |
✅ Done | PLAN_ST_TO_ST.md §Phase 4 |
| ST-ST-6 | Cleanup & lifecycle. cleanup_st_change_buffers_by_frontier() for ST buffers; removed prewarm skip for ST sources; ST buffer cleanup in both differential and full refresh paths |
✅ Done | PLAN_ST_TO_ST.md §Phase 5–6 |
ST-to-ST differential subtotal: ~4.5–6.5 weeks
Adaptive/Event-Driven Scheduler Wake (Must-Ship)
In plain terms: The scheduler currently wakes on a fixed 1-second timer even when nothing has changed. This adds event-driven wake: CDC triggers notify the scheduler immediately when changes arrive. Median end-to-end latency drops from ~515 ms to ~15 ms for low-volume workloads — a 34× improvement. This is a must-ship item because low latency is a primary project goal.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
pg_notify('pgtrickle_wake', '') after each change buffer INSERT; scheduler issues LISTEN pgtrickle_wake at startup; 10 ms debounce coalesces rapid notifications; poll fallback preserved. New GUCs: event_driven_wake (default true), wake_debounce_ms (default 10). E2E tests in tests/e2e_wake_tests.rs. |
— | REPORT_OVERALL_STATUS.md §R16 |
Event-driven wake subtotal: ✅ Complete
Stretch Goals (if capacity allows after Must-Ship)
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
|
2–4d | PLAN_PARTITIONING_SPIKE.md |
| |
CREATE STREAM TABLE … PARTITION BY; st_partition_key catalog column.partition_by parameter added to all three create_stream_table* functions; st_partition_key TEXT column in catalog; validate_partition_key() validates column exists in output; build_create_table_sql emits PARTITION BY RANGE (key); setup_storage_table creates default catch-all partition and non-unique __pgt_row_id index. |
1–2 wk | PLAN_PARTITIONING_SPIKE.md |
| |
extract_partition_range() in refresh.rs runs SELECT MIN/MAX(key)::text on the resolved delta SQL; returns None on empty delta (MERGE skipped). |
1 wk | PLAN_PARTITIONING_SPIKE.md §8 |
| |
inject_partition_predicate() replaces __PGT_PART_PRED__ placeholder in MERGE ON clause with AND st."key" BETWEEN 'min' AND 'max'; CachedMergeTemplate stores delta_sql_template; D-2 prepared statements disabled for partitioned STs. |
2–3 wk | PLAN_PARTITIONING_SPIKE.md §8 |
| |
EXPLAIN (ANALYZE, BUFFERS) partition-scan verification.tests/e2e_partition_tests.rs covering: initial populate, differential inserts, updates/deletes, empty-delta fast path, EXPLAIN plan verification, invalid partition key rejection; added to light-E2E allowlist. |
1 wk | PLAN_PARTITIONING_SPIKE.md §9 |
Stretch subtotal: STRETCH-1 + A1-1 + A1-2 + A1-3 + A1-4 ✅ All complete
DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)
In plain terms: Now that ST-to-ST differential refresh eliminates the “every hop is FULL” bottleneck, the next performance frontier is reducing per-hop overhead and exploiting DAG structure more aggressively. These items target the scheduling and dispatch layer — not the DVM engine — and collectively can reduce end-to-end propagation latency by 30–50% for heterogeneous DAGs.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
remaining_upstreams tracking with immediate downstream readiness propagation. No level barrier exists. 3 validation tests. |
2–3 wk | PLAN_DAG_PERFORMANCE.md §8.1 |
| |
WaitLatch with shared-memory completion flags.compute_adaptive_poll_ms() pure-logic helper with exponential backoff (20ms → 200ms); ParallelDispatchState tracks adaptive_poll_ms + completions_this_tick; resets to 20ms on worker completion; 8 unit tests. |
1–2 wk | PLAN_DAG_PERFORMANCE.md §8.2 |
| |
pgt_refresh_history. When a join ST amplifies delta beyond a configurable threshold (e.g., output > 100× input), emit a performance WARNING and optionally fall back to FULL for that hop. Expose amplification metrics in explain_st().pg_trickle.delta_amplification_threshold GUC (default 100×); compute_amplification_ratio + should_warn_amplification pure-logic helpers; WARNING emitted after MERGE with ratio, counts, and tuning hint; explain_st() exposes amplification_stats JSON from last 20 DIFFERENTIAL refreshes; 15 unit tests. |
3–5d | PLAN_DAG_PERFORMANCE.md §8.4 |
| |
changes_pgt_ buffer table. Eliminates 2× SPI DML per hop (~20 ms savings per hop for 10K-row deltas).FusedChain execution unit kind; find_fusable_chains() pure-logic detection; capture_delta_to_bypass_table() writes to temp table; DiffContext.st_bypass_tables threads bypass through DVM scan; delta SQL cache bypassed when active; 11+4 unit tests. |
3–4 wk | PLAN_DAG_PERFORMANCE.md §8.3 |
| |
__pgt_row_id that accumulate between reads during rapid-fire upstream refreshes. Adapts existing compute_net_effect() logic to the ST buffer schema.compact_st_change_buffer() with build_st_compact_sql() pure-logic helper; advisory lock namespace 0x5047_5500; integrated in execute_differential_refresh() after C-4 base-table compaction; 9 unit tests. |
1–2 wk | PLAN_DAG_PERFORMANCE.md §8.5 |
DAG refresh performance subtotal: ~8–12 weeks
v0.11.0 total: ~7–10 weeks (partitioning + isolation) + ~12h observability + ~14–21h default tuning + ~7–12h safety hardening + ~2–4 weeks should-ship (bitmask + fuse + external corpus) + ~4.5–6.5 weeks ST-to-ST differential + ~2–3 weeks event-driven wake + ~1–2 days correctness quick-wins + ~2–3 days documentation + ~8–12 weeks DAG performance
Exit criteria: ✅ All met. Released 2026-03-26.
- [x] Declaratively partitioned stream tables accepted; partition key tracked in catalog — ✅ Done in v0.11.0 Partitioning Spike (STRETCH-1 RFC + A1-1)
- [x] Partitioned storage table created with PARTITION BY RANGE + default catch-all partition — ✅ Done (A1-1 physical DDL)
- [x] Partition-key range predicate injected into MERGE ON clause; empty-delta fast-path skips MERGE — ✅ Done (A1-2 + A1-3)
- [x] Partition-scoped MERGE benchmark: 10M-row ST, 0.1% change rate (expect ~100× I/O reduction) — ✅ Done (A1-4 E2E tests)
- [x] Per-database worker quotas enforced; burst reclaimed within 1 scheduler cycle — ✅ Done in v0.11.0 Phase 11 (pg_trickle.per_database_worker_quota GUC; burst to 150% at < 80% cluster load)
- [x] Prometheus queries + alerting rules + Grafana dashboard shipped — ✅ Done in v0.11.0 Phase 3 (monitoring/ directory)
- [x] DEF-1: parallel_refresh_mode default is 'on'; unit test updated — ✅ Done in v0.11.0 Phase 1
- [x] DEF-2: auto_backoff default is true; CONFIGURATION.md updated — ✅ Done in v0.10.0
- [x] DEF-3: SemiJoin delta-key pre-filter verified already implemented — ✅ Done in v0.11.0 Phase 2 (pre-existing in semi_join.rs)
- [x] DEF-4: Invalidation ring capacity is 128 slots — ✅ Done in v0.11.0 Phase 1
- [x] DEF-5: block_source_ddl default is true; error message includes escape-hatch instructions — ✅ Done in v0.11.0 Phase 1
- [x] SAF-1: No panic!/unwrap() in background worker hot paths; check_skip_needed logs SPI errors — ✅ Done in v0.11.0 Phase 1
- [x] SAF-2: Failure-injection E2E tests in tests/e2e_safety_tests.rs — ✅ Done in v0.11.0 Phase 2
- [x] WB-1+2: Changed-column bitmask supports >63 columns (VARBIT); wide-table CDC selectivity E2E passes; schema migration tested — ✅ Done in v0.11.0 Phase 5
- [x] FUSE-1–6: Fuse blows on configurable change-count threshold; reset_fuse() recovers in all three action modes; diamond/DAG interaction tested — ✅ Done in v0.11.0 Phase 6
- [x] TS2: TPC-H-derived 5-query DIFFERENTIAL correctness gate passes with zero mismatches; gated in CI — ✅ Done in v0.11.0 Phase 9
- [x] QF-1–4: println! replaced with guarded pgrx::log!(); AUTO downgrades emit WARNING; append_only reversion verified already warns; parser invariant sites annotated — ✅ Done in v0.11.0 Phase 1
- [x] G12-ERM: effective_refresh_mode column present in pgt_stream_tables; explain_refresh_mode() returns configured mode, effective mode, downgrade reason — ✅ Done in v0.11.0 Phase 2
- [x] G12-2: TopK path validates assumptions at refresh time; triggers FULL fallback with WARNING on violation — ✅ Done in v0.11.0 Phase 4
- [x] G12-AGG: Group-rescan aggregate warning fires at create_stream_table for DIFFERENTIAL mode; strategy visible in explain_st() — ✅ Done in v0.11.0 Phase 4
- [x] G15-PV: Incompatible cdc_mode/refresh_mode and diamond_schedule_policy combinations rejected at creation time with structured HINT — ✅ Done in v0.11.0 Phase 2
- [x] G13-EH: UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, QueryParseError include DETAIL and HINT fields — ✅ Done in v0.11.0 Phase 2
- [x] G17-EC01B-NEG: Negative regression test documents ≥3-scan fall-back behavior; linked to v0.12.0 EC01B fix — ✅ Done in v0.11.0 Phase 4
- [x] G16-GS/SM/MQR/GUC: GETTING_STARTED restructured (5 chapters + Hello World + Advanced Topics); DVM_OPERATORS support matrix; monitoring quick reference; CONFIGURATION.md GUC matrix — ✅ Done in v0.11.0 Phase 11
- [x] ST-ST-1–6: All ST-to-ST dependencies refresh differentially when upstream has a change buffer; FULL refreshes on upstream produce pre/post I/D diff; no cascading FULL — ✅ Done in v0.11.0 Phase 8
- [x] WAKE-1: Event-driven scheduler wake; median latency ~15 ms (34× improvement); 10 ms debounce; poll fallback — ✅ Done in v0.11.0 Phase 7
- [x] DAG-1: Intra-tick pipelining confirmed in Phase 4 architecture — ✅ Done
- [x] DAG-2: Adaptive poll interval (20 ms → 200 ms exponential backoff) — ✅ Done in v0.11.0 Phase 10
- [x] DAG-3: Delta amplification detection with pg_trickle.delta_amplification_threshold GUC — ✅ Done in v0.11.0 Phase 10
- [x] DAG-4: ST buffer bypass (FusedChain) for single-consumer CALCULATED chains — ✅ Done in v0.11.0 Phase 10
- [x] DAG-5: ST buffer batch coalescing cancels redundant I/D pairs — ✅ Done in v0.11.0 Phase 10
- [x] Extension upgrade path tested (0.10.0 → 0.11.0) — ✅ upgrade SQL in sql/pg_trickle--0.10.0--0.11.0.sql