v0.11.0.md-full / PostgreSQL Extension Network

- v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness

Plain-language companion: v0.11.0.md

v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness

Status: Released 2026-03-26. See CHANGELOG.md §0.11.0 for the full feature list.

Highlights: 34× lower latency via event-driven scheduler wake · incremental ST-to-ST refresh chains · declaratively partitioned stream tables (100× I/O reduction) · ready-to-use Prometheus + Grafana monitoring stack · FUSE circuit breaker · VARBIT changed-column bitmask (no more 63-column cap) · per-database worker quotas · DAG scheduling performance improvements (fused chains, adaptive polling, amplification detection) · TPC-H correctness gate in CI · safer production defaults.

Partitioned Stream Tables — Storage (A-1)

In plain terms: A 10M-row stream table partitioned into 100 ranges means only the 2–3 partitions that actually received changes are touched by MERGE — reducing the MERGE scan from 10M rows to ~100K. The partition key must be a user-visible column and the refresh path must inject a verified range predicate.

Item	Description	Effort	Ref
A1-1	DDL: `CREATE STREAM TABLE … PARTITION BY` declaration; catalog column for partition key	1–2 wk	PLAN_NEW_STUFF.md §A-1
A1-2	Delta inspection: extract min/max of partition key from delta CTE per scheduler tick	1 wk	PLAN_NEW_STUFF.md §A-1
A1-3	MERGE rewrite: inject validated partition-key range predicate or issue per-partition MERGEs via Rust loop	2–3 wk	PLAN_NEW_STUFF.md §A-1
A1-4	E2E benchmarks: 10M-row partitioned ST, 0.1% change rate concentrated in 2–3 partitions	1 wk	PLAN_NEW_STUFF.md §A-1

⚠️ MERGE joins on __pgt_row_id (a content hash unrelated to the partition key) — partition pruning will not activate automatically. A predicate injection step is mandatory. See PLAN_NEW_STUFF.md §A-1 risk analysis before starting.

Retraction consideration (A-1): The 5–7 week effort estimate is optimistic. The core assumption — that partition pruning can be activated via a WHERE partition_key BETWEEN ? AND ? predicate — requires the partition key to be a tracked catalog column (not currently the case) and a verified range derivation from the delta. The alternative (per-partition MERGE loop in Rust) is architecturally sound but requires significant catalog and refresh-path changes. A design spike (2–4 days) producing a written implementation plan must be completed before A1-1 is started. The milestone is at P3 / Very High risk and should not block the 1.0 release if the design spike reveals additional complexity.

Partitioned stream tables subtotal: ~5–7 weeks

Multi-Database Scheduler Isolation (C-3)

Item	Description	Effort	Ref
~~C3-1~~	~~Per-database worker quotas (`pg_trickle.per_database_worker_quota`); priority ordering (IMMEDIATE > Hot > Warm > Cold); burst capacity up to 150% when other DBs are under budget~~ ✅ Done in v0.11.0 Phase 11 — `compute_per_db_quota()` helper with burst threshold at 80% cluster utilisation; `sort_ready_queue_by_priority()` dispatches ImmediateClosure first; 7 unit tests.	—	src/scheduler.rs

Multi-DB isolation subtotal: ✅ Complete

Prometheus & Grafana Observability

In plain terms: Most teams already run Prometheus and Grafana to monitor their databases. This ships ready-to-use configuration files — no custom code, no extension changes — that plug into the standard postgres_exporter and light up a Grafana dashboard showing refresh latency, staleness, error rates, CDC lag, and per-stream-table detail. Also includes Prometheus alerting rules so you get paged when a stream table goes stale or starts error-looping. A Docker Compose file lets you try the full observability stack with a single docker compose up.

Zero-code monitoring integration. All config files live in a new monitoring/ directory in the main repo (or a separate pgtrickle-monitoring repo). Queries use existing views (pg_stat_stream_tables, check_cdc_health(), quick_health).

Item	Description	Effort	Ref
~~OBS-1~~	~~Prometheus metrics out of the box.~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/prometheus/pg_trickle_queries.yml` exports 14 metrics (per-table refresh stats, health summary, CDC buffer sizes, status counts, recent error rate) via postgres_exporter.	—	monitoring/prometheus/pg_trickle_queries.yml
~~OBS-2~~	~~Get paged when things go wrong.~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/prometheus/alerts.yml` has 8 alerting rules: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration, cluster WARNING/CRITICAL.	—	monitoring/prometheus/alerts.yml
~~OBS-3~~	~~See everything at a glance.~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/grafana/dashboards/pg_trickle_overview.json` has 6 sections: cluster overview stat panels, refresh performance time-series, staleness heatmap, CDC health graphs, per-table drill-down table with schema/table variable filters.	—	monitoring/grafana/dashboards/pg_trickle_overview.json
~~OBS-4~~	~~Try it all in one command.~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/docker-compose.yml` spins up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with pre-wired config and demo seed data (`monitoring/init/01_demo.sql`). `docker compose up` → Grafana at :3000.	—	monitoring/docker-compose.yml

Observability subtotal: ~12 hours ✅

Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md)

These four changes flip conservative defaults to the behavior that is safe and correct in production. All underlying features are implemented and tested; only the default values change. Each keeps the original GUC so operators can revert if needed.

Item	Description	Effort	Ref
~~DEF-1~~	~~Flip `parallel_refresh_mode` default to `'on'`.~~ ✅ Done in v0.11.0 Phase 1 — default flipped; `normalize_parallel_refresh_mode` maps `None`/unknown → `On`; unit test renamed to `defaults_to_on`.	—	REPORT_OVERALL_STATUS.md §R1
DEF-2	~~Flip `auto_backoff` default to `true`.~~ ✅ Done in v0.10.0 — default flipped to `true`; trigger threshold raised to 95%, cap reduced to 8×, log level raised to WARNING. CONFIGURATION.md updated.	1–2h	REPORT_OVERALL_STATUS.md §R10
~~DEF-3~~	~~SemiJoin delta-key pre-filter (O-1).~~ ✅ Verified already implemented in v0.11.0 Phase 2 — `left_snapshot_filtered` pre-filter with `WHERE left_key IN (SELECT DISTINCT right_key FROM delta)` was already present in `semi_join.rs`.	—	src/dvm/operators/semi_join.rs
~~DEF-4~~	~~Increase invalidation ring capacity from 32 to 128 slots.~~ ✅ Done in v0.11.0 Phase 1 — `INVALIDATION_RING_CAPACITY` raised to 128 in `shmem.rs`.	—	REPORT_OVERALL_STATUS.md §R9
~~DEF-5~~	~~Flip `block_source_ddl` default to `true`.~~ ✅ Done in v0.11.0 Phase 1 — default flipped to `true`; both error messages in `hooks.rs` include step-by-step escape-hatch procedure.	—	REPORT_OVERALL_STATUS.md §R12

Default tuning subtotal: ~14–21 hours

Safety & Resilience Hardening (Must-Ship)

In plain terms: The background worker should never silently hang or leave a stream table in an undefined state when an internal operation fails. These items replace panic!/unwrap() in code paths reachable from the background worker with structured errors and graceful recovery.

Item	Description	Effort	Ref
~~SAF-1~~	~~Replace worker-path panics with structured errors.~~ ✅ Done in v0.11.0 Phase 1 — full audit of `scheduler.rs`, `refresh.rs`, `hooks.rs`: no `panic!`/`unwrap()` outside `#[cfg(test)]`. `check_skip_needed` now logs `WARNING` on SPI error with table name and error details. Audit finding documented in comment.	—	src/scheduler.rs
~~SAF-2~~	~~Failure-injection E2E test.~~ ✅ Done in v0.11.0 Phase 2 — two E2E tests in `tests/e2e_safety_tests.rs`: (1) column drop triggers UpstreamSchemaChanged, verifies scheduler stays alive and other STs continue; (2) source table drop, same verification.	—	tests/e2e_safety_tests.rs

Safety hardening subtotal: ~7–12 hours

Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15)

In plain terms: Six self-contained improvements identified in the deep gap analysis. Each takes under a day and substantially reduces silent failure modes, operator confusion, and diagnostic friction.

Quick Fixes (< 1 hour each)

Item	Description	Effort	Ref
QF-1	~~Fix unguarded debug `println!`.~~ ✅ Done in v0.11.0 Phase 1 — `println!` replaced with `pgrx::log!()` guarded by new `pg_trickle.log_merge_sql` GUC (default `off`).	—	src/refresh.rs
QF-2	~~Upgrade AUTO mode downgrade log level.~~ ✅ Done in v0.11.0 Phase 1 — four AUTO→FULL downgrade paths in `api.rs` raised from `pgrx::info!()` to `pgrx::warning!()`.	—	plans/performance/REPORT_OVERALL_STATUS.md §12
QF-3	~~Warn when `append_only` auto-reverts.~~ ✅ Verified already implemented — `pgrx::warning!()` + `emit_alert(AppendOnlyReverted)` already present in `refresh.rs`.	—	plans/performance/REPORT_OVERALL_STATUS.md §15
QF-4	~~Document parser `unwrap()` invariants.~~ ✅ Done in v0.11.0 Phase 1 — `// INVARIANT:` comments added at four `unwrap()` sites in `dvm/parser.rs` (after `is_empty()` guard, `len()==1` guards, and non-empty `Err` return).	—	src/dvm/parser.rs

Quick-fix subtotal: ~3–4 hours

Effective Refresh Mode Tracking (G12-ERM)

In plain terms: When a stream table is configured as AUTO, operators currently have no way to discover which mode is actually being used at runtime without reading warning logs. Storing the resolved mode in the catalog and exposing a diagnostic function closes this observability gap.

Item	Description	Effort	Ref
~~G12-ERM-1~~	~~Add `effective_refresh_mode` column to `pgt_stream_tables`~~. ✅ Done in v0.11.0 Phase 2 — column added; scheduler writes actual mode (FULL/DIFFERENTIAL/APPEND_ONLY/TOP_K/NO_DATA) via thread-local tracking; upgrade SQL `pg_trickle--0.10.0--0.11.0.sql` created.	—	src/catalog.rs
~~G12-ERM-2~~	~~Add `explain_refresh_mode(name TEXT)` SQL function~~. ✅ Done in v0.11.0 Phase 2 — `pgtrickle.explain_refresh_mode()` returns configured mode, effective mode, and downgrade reason.	—	src/api.rs

Effective refresh mode subtotal: ~4–7 hours

Correctness Guards (G12-2, G12-AGG)

Item	Description	Effort	Ref
~~G12-2~~	~~TopK runtime validation.~~ ✅ Done in v0.11.0 Phase 4 — `validate_topk_metadata()` re-parses the reconstructed full query on each TopK refresh; `validate_topk_metadata_fields()` validates stored fields (pure logic, unit-testable). Falls back to FULL + `WARNING` on mismatch. 7 unit tests.	—	src/refresh.rs
~~G12-AGG~~	~~Group-rescan aggregate warning.~~ ✅ Done in v0.11.0 Phase 4 — `classify_agg_strategy()` classifies each aggregate as ALGEBRAIC_INVERTIBLE / ALGEBRAIC_VIA_AUX / SEMI_ALGEBRAIC / GROUP_RESCAN. Warning emitted at `create_stream_table` time for DIFFERENTIAL + group-rescan aggs. Strategy exposed in `explain_st()` as `aggregate_strategies` JSON. 18 unit tests.	—	src/dvm/parser.rs

Correctness guards subtotal: ✅ Complete

Parameter & Error Hardening (G15-PV, G13-EH)

Item	Description	Effort	Ref
~~G15-PV~~	~~Validate incompatible parameter combinations.~~ ✅ Done in v0.11.0 Phase 2 — (a) `cdc_mode='wal'` + `refresh_mode='IMMEDIATE'` rejection was already present; (b) `diamond_schedule_policy='slowest'` + `diamond_consistency='none'` now rejected in `create_stream_table_impl` and `alter_stream_table_impl` with structured error.	—	src/api.rs
~~G13-EH~~	~~Structured error HINT/DETAIL fields.~~ ✅ Done in v0.11.0 Phase 2 — `raise_error_with_context()` helper in `api.rs` uses `ErrorReport::new().set_detail().set_hint()` for `UnsupportedOperator`, `CycleDetected`, `UpstreamSchemaChanged`, and `QueryParseError`; all 8 API-boundary error sites updated.	—	src/api.rs

Parameter & error hardening subtotal: ~6–12 hours

Testing: EC-01 Boundary Regression (G17-EC01B-NEG)

Item	Description	Effort	Ref
~~G17-EC01B-NEG~~	~~Add a negative regression test asserting that ≥3-scan join right subtrees currently fall back to FULL refresh.~~ ✅ Done in v0.11.0 Phase 4 — 4 unit tests in `join_common.rs` covering 3-way join, 4-way join, right-subtree ≥3 scans, and 2-scan boundary. `// TODO: Remove when EC01B-1/EC01B-2 fixed in v0.12.0`	—	src/dvm/operators/join_common.rs

EC-01 boundary regression subtotal: ✅ Complete

Documentation Quick Wins (G16-GS, G16-SM, G16-MQR, G15-GUC)

Item	Description	Effort	Ref
G16-GS	Restructure `GETTING_STARTED.md` with progressive complexity. Five chapters: (1) Hello World — single-table ST with no join; (2) Multi-table join; (3) Scheduling & backpressure; (4) Monitoring — 5 key functions; (5) Advanced — FUSE, wide bitmask, partitions. Remove the current flat wall-of-SQL structure. ✅ Done in v0.11.0 Phase 11 — 5-chapter structure implemented; Chapter 1 Hello World example added; Chapter 5 Advanced Topics adds inline FUSE, partitioning, IMMEDIATE, and multi-tenant quota examples.	—	docs/GETTING_STARTED.md
~~G16-SM~~	~~SQL/mode operator support matrix.~~ ✅ Done — 60+ row operator support matrix added to `docs/DVM_OPERATORS.md` covering all operators × FULL/DIFFERENTIAL/IMMEDIATE modes with caveat footnotes.	—	docs/DVM_OPERATORS.md
~~G16-MQR~~	~~Monitoring quick reference.~~ ✅ Done — Monitoring Quick Reference section added to `docs/GETTING_STARTED.md` with `pgt_status()`, `health_check()`, `change_buffer_sizes()`, `dependency_tree()`, `fuse_status()`, Prometheus/Grafana stack, key metrics table, and alert summary.	—	docs/GETTING_STARTED.md
~~G15-GUC~~	~~GUC interaction matrix.~~ ✅ Done — GUC Interaction Matrix (14 interaction pairs) and three named Tuning Profiles (Low-Latency, High-Throughput, Resource-Constrained) added to `docs/CONFIGURATION.md`.	—	docs/CONFIGURATION.md

Documentation subtotal: ~2–3 days

Correctness quick-wins & documentation subtotal: ~1–2 days code + ~2–3 days docs

Should-Ship Additions

Wider Changed-Column Bitmask (>63 columns)

In plain terms: Stream tables built on source tables with more than 63 columns fall back silently to tracking every column on every UPDATE, losing all CDC selectivity. Extending the changed_cols field from a BIGINT to a BYTEA vector removes this cliff without breaking existing deployments.

Item	Description	Effort	Ref
WB-1	Extend the CDC trigger `changed_cols` column from `BIGINT` to `BYTEA`; update bitmask encoding/decoding in `cdc.rs`; add schema migration for existing change buffer tables (tables with <64 columns are unaffected at the data level).	1–2 wk	REPORT_OVERALL_STATUS.md §R13
WB-2	E2E test: wide (>63 column) source table; verify only referenced columns trigger delta propagation; benchmark UPDATE selectivity before/after.	2–4h	`tests/e2e_cdc_tests.rs`

Wider bitmask subtotal: ~1–2 weeks + ~4h testing

Fuse — Anomalous Change Detection

In plain terms: A circuit breaker that stops a stream table from processing an unexpectedly large batch of changes (runaway script, mass delete, data migration) without operator review. A blown fuse halts refresh and emits a pgtrickle_alert NOTIFY; reset_fuse() resumes with a chosen recovery action (apply, reinitialize, or skip_changes).

Item	Description	Effort	Ref
~~FUSE-1~~ ✅	~~Catalog: fuse state columns on `pgt_stream_tables` (`fuse_mode`, `fuse_state`, `fuse_ceiling`, `fuse_sensitivity`, `blown_at`, `blow_reason`)~~	1–2h	PLAN_FUSE.md
~~FUSE-2~~ ✅	~~`alter_stream_table()` new params: `fuse`, `fuse_ceiling`, `fuse_sensitivity`~~	1h	PLAN_FUSE.md
~~FUSE-3~~ ✅	~~`reset_fuse(name, action => 'apply'\|'reinitialize'\|'skip_changes')` SQL function~~	1h	PLAN_FUSE.md
~~FUSE-4~~ ✅	~~`fuse_status()` introspection function~~	1h	PLAN_FUSE.md
~~FUSE-5~~ ✅	~~Scheduler pre-check: count change buffer rows; evaluate threshold; blow fuse + NOTIFY if exceeded~~	2–3h	PLAN_FUSE.md
~~FUSE-6~~ ✅	~~E2E tests: normal baseline, spike → blow, reset (`apply`/`reinitialize`/`skip_changes`), diamond/DAG interaction~~	4–6h	PLAN_FUSE.md

Fuse subtotal: ~10–14 hours — ✅ Complete

External Correctness Gate (TS1 or TS2)

In plain terms: Run an independent public query corpus through pg_trickle’s DIFFERENTIAL mode and assert the results match a vanilla PostgreSQL execution. This catches blind spots that the extension’s own test suite cannot, and provides an objective correctness baseline before v1.0.

Item	Description	Effort	Ref
TS1	sqllogictest suite. Run the PostgreSQL sqllogic suite through pg_trickle DIFFERENTIAL mode; gate CI on zero correctness mismatches. Preferred choice: broadest query coverage.	2–3d	PLAN_TESTING_GAPS.md §J
TS2	JOB (Join Order Benchmark). Correctness baseline and refresh latency profiling on realistic multi-join analytical queries. Alternative if sqllogictest setup is too costly.	1–2d	PLAN_TESTING_GAPS.md §J

Deliver one of TS1 or TS2; whichever is completed first meets the exit criterion.

External correctness gate subtotal: ~1–3 days

Differential ST-to-ST Refresh (✅ Done)

In plain terms: When stream table B’s defining query reads from stream table A, pg_trickle currently forces a FULL refresh of B every time A updates — re-executing B’s entire query even when only a handful of rows changed. This feature gives ST-to-ST dependencies the same CDC change buffer that base tables already have, so B refreshes differentially (applying only the delta). Crucially, even when A itself does a FULL refresh, a pre/post snapshot diff is captured so B still receives a small I/D delta rather than cascading FULL through the chain.

Item	Description	Status	Ref
ST-ST-1	Change buffer infrastructure. `create_st_change_buffer_table()` / `drop_st_change_buffer_table()` in `cdc.rs`; lifecycle hooks in `api.rs`; idempotent `ensure_st_change_buffer()`	✅ Done	PLAN_ST_TO_ST.md §Phase 1
ST-ST-2	Delta capture — DIFFERENTIAL path. Force explicit DML when ST has downstream consumers; capture delta from `__pgt_delta_{id}` to `changes_pgt_{id}`	✅ Done	PLAN_ST_TO_ST.md §Phase 2
ST-ST-3	Delta capture — FULL path. Pre/post snapshot diff writes I/D pairs to `changes_pgt_{id}`; eliminates cascading FULL	✅ Done	PLAN_ST_TO_ST.md §7
ST-ST-4	DVM scan operator for ST sources. Read from `changes_pgt_{id}`; `pgt_`-prefixed LSN tokens; extended frontier and placeholder resolver	✅ Done	PLAN_ST_TO_ST.md §Phase 3
ST-ST-5	Scheduler integration. Buffer-based change detection in `has_stream_table_source_changes()`; removed FULL override; frontier augmented with ST source positions	✅ Done	PLAN_ST_TO_ST.md §Phase 4
ST-ST-6	Cleanup & lifecycle. `cleanup_st_change_buffers_by_frontier()` for ST buffers; removed prewarm skip for ST sources; ST buffer cleanup in both differential and full refresh paths	✅ Done	PLAN_ST_TO_ST.md §Phase 5–6

ST-to-ST differential subtotal: ~4.5–6.5 weeks

Adaptive/Event-Driven Scheduler Wake (Must-Ship)

In plain terms: The scheduler currently wakes on a fixed 1-second timer even when nothing has changed. This adds event-driven wake: CDC triggers notify the scheduler immediately when changes arrive. Median end-to-end latency drops from ~515 ms to ~15 ms for low-volume workloads — a 34× improvement. This is a must-ship item because low latency is a primary project goal.

Item	Description	Effort	Ref
~~WAKE-1~~	~~Event-driven scheduler wake.~~ ✅ Done in v0.11.0 Phase 7 — CDC triggers emit `pg_notify('pgtrickle_wake', '')` after each change buffer INSERT; scheduler issues `LISTEN pgtrickle_wake` at startup; 10 ms debounce coalesces rapid notifications; poll fallback preserved. New GUCs: `event_driven_wake` (default `true`), `wake_debounce_ms` (default `10`). E2E tests in `tests/e2e_wake_tests.rs`.	—	REPORT_OVERALL_STATUS.md §R16

Event-driven wake subtotal: ✅ Complete

Stretch Goals (if capacity allows after Must-Ship)

Item	Description	Effort	Ref
~~STRETCH-1~~	~~Partitioned stream tables — design spike only.~~ ✅ Done in v0.11.0 Partitioning Spike — RFC written (PLAN_PARTITIONING_SPIKE.md), go/no-go decision: Go. A1-1 implemented (catalog column, API parameter, validation).	2–4d	PLAN_PARTITIONING_SPIKE.md
~~A1-1~~	~~DDL: `CREATE STREAM TABLE … PARTITION BY`; `st_partition_key` catalog column.~~ ✅ Done — `partition_by` parameter added to all three `create_stream_table*` functions; `st_partition_key TEXT` column in catalog; `validate_partition_key()` validates column exists in output; `build_create_table_sql` emits `PARTITION BY RANGE (key)`; `setup_storage_table` creates default catch-all partition and non-unique `__pgt_row_id` index.	1–2 wk	PLAN_PARTITIONING_SPIKE.md
~~A1-2~~	~~Delta min/max inspection.~~ ✅ Done — `extract_partition_range()` in `refresh.rs` runs `SELECT MIN/MAX(key)::text` on the resolved delta SQL; returns `None` on empty delta (MERGE skipped).	1 wk	PLAN_PARTITIONING_SPIKE.md §8
~~A1-3~~	~~MERGE rewrite.~~ ✅ Done — `inject_partition_predicate()` replaces `__PGT_PART_PRED__` placeholder in MERGE ON clause with `AND st."key" BETWEEN 'min' AND 'max'`; `CachedMergeTemplate` stores `delta_sql_template`; D-2 prepared statements disabled for partitioned STs.	2–3 wk	PLAN_PARTITIONING_SPIKE.md §8
~~A1-4~~	~~E2E benchmarks: 10M-row partitioned ST, 0.1%/0.2%/100% change rate scenarios; `EXPLAIN (ANALYZE, BUFFERS)` partition-scan verification.~~ ✅ Done — 7 E2E tests added to `tests/e2e_partition_tests.rs` covering: initial populate, differential inserts, updates/deletes, empty-delta fast path, EXPLAIN plan verification, invalid partition key rejection; added to light-E2E allowlist.	1 wk	PLAN_PARTITIONING_SPIKE.md §9

Stretch subtotal: STRETCH-1 + A1-1 + A1-2 + A1-3 + A1-4 ✅ All complete

DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)

In plain terms: Now that ST-to-ST differential refresh eliminates the “every hop is FULL” bottleneck, the next performance frontier is reducing per-hop overhead and exploiting DAG structure more aggressively. These items target the scheduling and dispatch layer — not the DVM engine — and collectively can reduce end-to-end propagation latency by 30–50% for heterogeneous DAGs.

Item	Description	Effort	Ref
~~DAG-1~~	Intra-tick pipelining. Within a single scheduler tick, begin processing a downstream ST as soon as all its specific upstream dependencies have completed — not when the entire topological level finishes. Requires per-ST completion tracking in the parallel dispatch loop and immediate enqueuing of newly-ready STs. Expected 30–50% latency reduction for DAGs with mixed-cost levels. ✅ Done — Already achieved by Phase 4’s parallel dispatch architecture: per-dependency `remaining_upstreams` tracking with immediate downstream readiness propagation. No level barrier exists. 3 validation tests.	2–3 wk	PLAN_DAG_PERFORMANCE.md §8.1
~~DAG-2~~	Adaptive poll interval. Replace the fixed 200 ms parallel dispatch poll with exponential backoff (20 ms → 200 ms), resetting on worker completion. Makes parallel mode competitive with CALCULATED for cheap refreshes ($T_r \approx 10\text{ms}$). Alternative: `WaitLatch` with shared-memory completion flags. ✅ Done — `compute_adaptive_poll_ms()` pure-logic helper with exponential backoff (20ms → 200ms); `ParallelDispatchState` tracks `adaptive_poll_ms` + `completions_this_tick`; resets to 20ms on worker completion; 8 unit tests.	1–2 wk	PLAN_DAG_PERFORMANCE.md §8.2
~~DAG-3~~	Delta amplification detection. Track input→output delta ratio per hop via `pgt_refresh_history`. When a join ST amplifies delta beyond a configurable threshold (e.g., output > 100× input), emit a performance WARNING and optionally fall back to FULL for that hop. Expose amplification metrics in `explain_st()`. ✅ Done — `pg_trickle.delta_amplification_threshold` GUC (default 100×); `compute_amplification_ratio` + `should_warn_amplification` pure-logic helpers; WARNING emitted after MERGE with ratio, counts, and tuning hint; `explain_st()` exposes `amplification_stats` JSON from last 20 DIFFERENTIAL refreshes; 15 unit tests.	3–5d	PLAN_DAG_PERFORMANCE.md §8.4
~~DAG-4~~	ST buffer bypass for single-consumer CALCULATED chains. For ST dependencies with exactly one downstream consumer refreshing in the same tick, pass the delta in-memory instead of writing/reading from the `changes_pgt_` buffer table. Eliminates 2× SPI DML per hop (~20 ms savings per hop for 10K-row deltas). ✅ Done — `FusedChain` execution unit kind; `find_fusable_chains()` pure-logic detection; `capture_delta_to_bypass_table()` writes to temp table; `DiffContext.st_bypass_tables` threads bypass through DVM scan; delta SQL cache bypassed when active; 11+4 unit tests.	3–4 wk	PLAN_DAG_PERFORMANCE.md §8.3
~~DAG-5~~	ST buffer batch coalescing. Apply net-effect computation to ST change buffers before downstream reads — cancel INSERT/DELETE pairs for the same `__pgt_row_id` that accumulate between reads during rapid-fire upstream refreshes. Adapts existing `compute_net_effect()` logic to the ST buffer schema. ✅ Done — `compact_st_change_buffer()` with `build_st_compact_sql()` pure-logic helper; advisory lock namespace 0x5047_5500; integrated in `execute_differential_refresh()` after C-4 base-table compaction; 9 unit tests.	1–2 wk	PLAN_DAG_PERFORMANCE.md §8.5

DAG refresh performance subtotal: ~8–12 weeks

v0.11.0 total: ~7–10 weeks (partitioning + isolation) + ~12h observability + ~14–21h default tuning + ~7–12h safety hardening + ~2–4 weeks should-ship (bitmask + fuse + external corpus) + ~4.5–6.5 weeks ST-to-ST differential + ~2–3 weeks event-driven wake + ~1–2 days correctness quick-wins + ~2–3 days documentation + ~8–12 weeks DAG performance

Exit criteria: ✅ All met. Released 2026-03-26. - [x] Declaratively partitioned stream tables accepted; partition key tracked in catalog — ✅ Done in v0.11.0 Partitioning Spike (STRETCH-1 RFC + A1-1) - [x] Partitioned storage table created with PARTITION BY RANGE + default catch-all partition — ✅ Done (A1-1 physical DDL) - [x] Partition-key range predicate injected into MERGE ON clause; empty-delta fast-path skips MERGE — ✅ Done (A1-2 + A1-3) - [x] Partition-scoped MERGE benchmark: 10M-row ST, 0.1% change rate (expect ~100× I/O reduction) — ✅ Done (A1-4 E2E tests) - [x] Per-database worker quotas enforced; burst reclaimed within 1 scheduler cycle — ✅ Done in v0.11.0 Phase 11 (pg_trickle.per_database_worker_quota GUC; burst to 150% at < 80% cluster load) - [x] Prometheus queries + alerting rules + Grafana dashboard shipped — ✅ Done in v0.11.0 Phase 3 (monitoring/ directory) - [x] DEF-1: parallel_refresh_mode default is 'on'; unit test updated — ✅ Done in v0.11.0 Phase 1 - [x] DEF-2: auto_backoff default is true; CONFIGURATION.md updated — ✅ Done in v0.10.0 - [x] DEF-3: SemiJoin delta-key pre-filter verified already implemented — ✅ Done in v0.11.0 Phase 2 (pre-existing in semi_join.rs) - [x] DEF-4: Invalidation ring capacity is 128 slots — ✅ Done in v0.11.0 Phase 1 - [x] DEF-5: block_source_ddl default is true; error message includes escape-hatch instructions — ✅ Done in v0.11.0 Phase 1 - [x] SAF-1: No panic!/unwrap() in background worker hot paths; check_skip_needed logs SPI errors — ✅ Done in v0.11.0 Phase 1 - [x] SAF-2: Failure-injection E2E tests in tests/e2e_safety_tests.rs — ✅ Done in v0.11.0 Phase 2 - [x] WB-1+2: Changed-column bitmask supports >63 columns (VARBIT); wide-table CDC selectivity E2E passes; schema migration tested — ✅ Done in v0.11.0 Phase 5 - [x] FUSE-1–6: Fuse blows on configurable change-count threshold; reset_fuse() recovers in all three action modes; diamond/DAG interaction tested — ✅ Done in v0.11.0 Phase 6 - [x] TS2: TPC-H-derived 5-query DIFFERENTIAL correctness gate passes with zero mismatches; gated in CI — ✅ Done in v0.11.0 Phase 9 - [x] QF-1–4: println! replaced with guarded pgrx::log!(); AUTO downgrades emit WARNING; append_only reversion verified already warns; parser invariant sites annotated — ✅ Done in v0.11.0 Phase 1 - [x] G12-ERM: effective_refresh_mode column present in pgt_stream_tables; explain_refresh_mode() returns configured mode, effective mode, downgrade reason — ✅ Done in v0.11.0 Phase 2 - [x] G12-2: TopK path validates assumptions at refresh time; triggers FULL fallback with WARNING on violation — ✅ Done in v0.11.0 Phase 4 - [x] G12-AGG: Group-rescan aggregate warning fires at create_stream_table for DIFFERENTIAL mode; strategy visible in explain_st() — ✅ Done in v0.11.0 Phase 4 - [x] G15-PV: Incompatible cdc_mode/refresh_mode and diamond_schedule_policy combinations rejected at creation time with structured HINT — ✅ Done in v0.11.0 Phase 2 - [x] G13-EH: UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, QueryParseError include DETAIL and HINT fields — ✅ Done in v0.11.0 Phase 2 - [x] G17-EC01B-NEG: Negative regression test documents ≥3-scan fall-back behavior; linked to v0.12.0 EC01B fix — ✅ Done in v0.11.0 Phase 4 - [x] G16-GS/SM/MQR/GUC: GETTING_STARTED restructured (5 chapters + Hello World + Advanced Topics); DVM_OPERATORS support matrix; monitoring quick reference; CONFIGURATION.md GUC matrix — ✅ Done in v0.11.0 Phase 11 - [x] ST-ST-1–6: All ST-to-ST dependencies refresh differentially when upstream has a change buffer; FULL refreshes on upstream produce pre/post I/D diff; no cascading FULL — ✅ Done in v0.11.0 Phase 8 - [x] WAKE-1: Event-driven scheduler wake; median latency ~15 ms (34× improvement); 10 ms debounce; poll fallback — ✅ Done in v0.11.0 Phase 7 - [x] DAG-1: Intra-tick pipelining confirmed in Phase 4 architecture — ✅ Done - [x] DAG-2: Adaptive poll interval (20 ms → 200 ms exponential backoff) — ✅ Done in v0.11.0 Phase 10 - [x] DAG-3: Delta amplification detection with pg_trickle.delta_amplification_threshold GUC — ✅ Done in v0.11.0 Phase 10 - [x] DAG-4: ST buffer bypass (FusedChain) for single-consumer CALCULATED chains — ✅ Done in v0.11.0 Phase 10 - [x] DAG-5: ST buffer batch coalescing cancels redundant I/D pairs — ✅ Done in v0.11.0 Phase 10 - [x] Extension upgrade path tested (0.10.0 → 0.11.0) — ✅ upgrade SQL in sql/pg_trickle--0.10.0--0.11.0.sql

PGXN

PostgreSQL Extension Network

Contents