Contents
-
- v0.16.0 — Performance & Refresh Optimization
- MERGE Alternatives & Planner Control (Phase D)
- Algebraic Aggregate UPDATE Fast-Path (B-1)
- Append-Only Stream Tables — MERGE Bypass (A-3-AO)
- Delta Predicate Pushdown (B-2)
- Shared-Memory Template Caching (G14-SHC)
- PostgreSQL 19 Forward-Compatibility (A3) — Moved to v1.0.0
- Change Buffer Compaction (C-4)
- Test Coverage Hardening (TG2)
- Performance Regression CI (BENCH-CI)
- Auto-Indexing on Stream Table Creation (AUTO-IDX)
- Quick Wins
- v0.16.0 — Performance & Refresh Optimization
Plain-language companion: v0.16.0.md
v0.16.0 — Performance & Refresh Optimization
Status: Released (2026-04-06).
Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic aggregates apply pinpoint updates, and a cross-backend template cache eliminates cold-start latency. Automated benchmark regression gating prevents future performance degradation.
Goal: Attack the MERGE bottleneck from multiple angles — alternative merge strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering, change buffer compaction, shared-memory template caching — close critical test coverage gaps to validate these new paths.
MERGE Alternatives & Planner Control (Phase D)
In plain terms: MERGE dominates 70–97% of refresh time. This explores whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE) is faster for specific patterns — particularly for small deltas against large stream tables where the MERGE join is the bottleneck.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
DELETE WHERE __pgt_row_id IN (delta_deletes) + INSERT ... SELECT FROM delta_inserts. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind pg_trickle.merge_strategy = 'auto'|'merge'|'delete_insert' GUC. |
|
|
MERGE alternatives subtotal: ~1–2 weeks
Algebraic Aggregate UPDATE Fast-Path (B-1)
In plain terms: The current aggregate delta rule recomputes entire groups where the GROUP BY key appears in the delta. For a group with 100K rows where 1 row changed, the aggregate re-scans all 100K rows in that group. For decomposable aggregates (
SUM/COUNT/AVG), a directUPDATE target SET col = col + Δreplaces the full MERGE join — dropping aggregate refresh from O(group_size) to O(1) per group.
| Item | Description | Effort | Ref |
|---|---|---|---|
| B-1 | Algebraic aggregate UPDATE fast-path. For GROUP BY queries where all aggregates are algebraically invertible (SUM/COUNT/AVG), replace the MERGE with a direct UPDATE target SET col = col + Δ WHERE group_key = ? for existing groups, plus INSERT for newly-appearing groups and DELETE for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding __pgt_aux_count / __pgt_aux_sum auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (MIN, MAX, STRING_AGG, etc.). Gate behind pg_trickle.aggregate_fast_path GUC (default true). Expected impact: 5–20× apply-time reduction for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time. |
4–6 wk | plans/performance/PLAN_NEW_STUFF.md §B-1 · plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4 |
B-1 subtotal: ~4–6 weeks
Append-Only Stream Tables — MERGE Bypass (A-3-AO)
In plain terms: When a stream table’s sources are insert-only (e.g. event logs, append-only tables where CDC never sees DELETE/UPDATE), the MERGE is pure overhead — every delta row is an INSERT, never a match. Bypassing MERGE entirely with a plain
INSERT INTO st SELECT ... FROM deltaremoves the join against the target table, takes onlyRowExclusiveLock, and is the single highest-payoff optimization for event-sourced architectures.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
CREATE STREAM TABLE … APPEND ONLY declaration. When set, refresh uses INSERT INTO st SELECT ... FROM delta instead of MERGE — no target-table join, RowExclusiveLock only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas. |
|
|
A-3-AO subtotal: ~1–2 weeks
Delta Predicate Pushdown (B-2)
In plain terms: For a query like
SELECT ... FROM orders WHERE status = 'shipped', if a CDC change row hasstatus = 'pending', the delta processes it through scan → filter → discard. All the scan and join work is wasted. Pushing the WHERE predicate down into the change buffer scan eliminates irrelevant rows before any join processing begins — a 5–10× reduction in delta row volume for selective queries.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
Filter nodes whose predicates reference only columns from a single source table. Inject these predicates into the delta_scan CTE as additional WHERE clauses (including OR old_col = 'value' for DELETE correctness). Expected impact: 5–10× delta row reduction for queries with < 10% selectivity. |
|
|
B-2 subtotal: ~2–3 weeks
Shared-Memory Template Caching (G14-SHC)
In plain terms: Every new database connection that triggers a refresh pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With PgBouncer in transaction mode, this happens on every single refresh cycle. Shared-memory caching stores compiled templates in PostgreSQL DSM so they survive across connections — eliminating the cold-start entirely for steady-state workloads.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G14-SHC | Shared-memory template caching (implementation). Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0. | ~2–3wk | plans/performance/REPORT_OVERALL_STATUS.md §14 |
G14-SHC subtotal: ~2–3 weeks
PostgreSQL 19 Forward-Compatibility (A3) — Moved to v1.0.0
PG 19 beta not available in time. Items A3-1 through A3-4 deferred to v1.0.0 milestone.
Change Buffer Compaction (C-4)
In plain terms: A high-churn source table can accumulate thousands of changes to the same row between refresh cycles — an INSERT followed by 10 UPDATEs followed by a DELETE is really just “nothing happened.” Compaction merges multiple changes to the same row ID into a single net change before the delta query runs, reducing change buffer size by 50–90% for high-churn tables. This directly reduces work for every downstream path (MERGE, DELETE+INSERT, append-only INSERT, predicate pushdown).
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
__pgt_row_id into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding pg_trickle.compact_threshold rows (default: 100K). Expected impact: 50–90% reduction in change buffer size for high-churn tables. |
|
|
C-4 subtotal: ~2–3 weeks
Test Coverage Hardening (TG2)
In plain terms: The performance optimizations in this release change core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown). Before and alongside these changes, critical test coverage gaps need closing — particularly around operators and scenarios where bugs could hide silently. These gaps were identified in the TESTING_GAPS_2 audit.
High-Priority Gaps
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
|
|
|
| |
|
|
|
| |
|
|
|
Medium-Priority Gaps
| Item | Description | Effort | Ref |
|---|---|---|---|
| TG2-MERGE | refresh.rs MERGE template unit tests. Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level. | 2–3d | TESTING_GAPS_2.md |
| TG2-CANCEL | Timeout/cancellation during refresh. Zero tests for statement_timeout, pg_cancel_backend() during active refresh. Risk: silent failures or resource leaks under production load. |
1–2d | TESTING_GAPS_2.md |
| TG2-SCHEMA | Source table schema evolution. Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change. | 2–3d | TESTING_GAPS_2.md |
TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)
Performance Regression CI (BENCH-CI)
In plain terms: v0.16.0 changes core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown, buffer compaction). Without automated benchmarks in CI, performance regressions will slip through silently. This adds a benchmark suite that runs on every PR and compares against a committed baseline — any statistically significant regression blocks the merge.
| Item | Description | Effort | Ref |
|---|---|---|---|
| BENCH-CI-1 | Benchmark harness in CI. Run just bench (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion’s --save-baseline / --baseline. |
2–3d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
| BENCH-CI-2 | Regression gate. Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers. | 1–2d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
| BENCH-CI-3 | Scenario coverage. Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales. | 2–3d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
BENCH-CI subtotal: ~1–2 weeks
Auto-Indexing on Stream Table Creation (AUTO-IDX)
In plain terms: pg_ivm automatically creates indexes on GROUP BY columns and primary key columns when creating an incrementally maintained view. pg_trickle currently requires manual index creation, which is a friction point for new users. Auto-indexing creates appropriate indexes at stream table creation time — GROUP BY keys, DISTINCT columns, and the
__pgt_row_idcovering index for MERGE performance.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
create_stream_table() time. Gated behind pg_trickle.auto_index GUC. |
— | src/api.rs |
| |
__pgt_row_id.pg_trickle.auto_index GUC (default true). |
— | src/api.rs |
AUTO-IDX: ✅ Done
Quick Wins
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
resume_stream_table(). |
— | |
| |
docs/ERRORS.md with all 20 variants documented. Cross-linked from FAQ. |
— | docs/ERRORS.md |
| |
true (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md. |
— | docs/CONFIGURATION.md |
| |
pg_trickle.max_buffer_rows GUC added (default: 1M). Forces FULL refresh + truncation when exceeded. |
— | src/config.rs · src/refresh.rs |
Quick wins: ✅ Done
v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins) Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.
Exit criteria:
- [x] PH-D1: DELETE+INSERT strategy implemented and gated behind merge_strategy GUC; correctness verified for INSERT/UPDATE/DELETE deltas
- [x] B-1: Algebraic aggregate fast-path replaces MERGE for SUM/COUNT/AVG GROUP BY queries; aggregate_fast_path GUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates; explain_st() exposes aggregate_path; existing tests pass — ✅ Done in v0.16.0 Phase 8
- [x] A-3-AO: CREATE STREAM TABLE … APPEND ONLY accepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event
- [x] B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction
- [x] G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with template_cache GUC; invalidation on DDL; explain_st() exposes stats
- A3: PG 19 builds and passes full E2E suite — moved to v1.0.0
- [x] C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads; compact_threshold GUC respected; no correctness regressions
- [x] TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE
- [x] TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss
- [x] TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions
- [x] TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0)
- [x] TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0)
- [x] TG2-SCHEMA: Source table type changes and column renames tested end-to-end
- [x] BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin
- [x] AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns; __pgt_row_id covering index for ≤ 8-column tables; auto_index GUC respected
- [x] C2-BUG: resume_stream_table() verified operational (present since v0.2.0)
- [x] ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes
- [x] GUC-DEFAULTS: planner_aggressive and cleanup_use_truncate defaults reviewed; trade-offs documented in CONFIGURATION.md
- [x] BUF-LIMIT: max_buffer_rows GUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded
- [x] Extension upgrade path tested (0.15.0 → 0.16.0)
- [x] just check-version-sync passes