Plain-language companion: v0.16.0.md

v0.16.0 — Performance & Refresh Optimization

Status: Released (2026-04-06).

Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic aggregates apply pinpoint updates, and a cross-backend template cache eliminates cold-start latency. Automated benchmark regression gating prevents future performance degradation.

Goal: Attack the MERGE bottleneck from multiple angles — alternative merge strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering, change buffer compaction, shared-memory template caching — close critical test coverage gaps to validate these new paths.

MERGE Alternatives & Planner Control (Phase D)

In plain terms: MERGE dominates 70–97% of refresh time. This explores whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE) is faster for specific patterns — particularly for small deltas against large stream tables where the MERGE join is the bottleneck.

Item Description Effort Ref
PH-D1 DELETE+INSERT strategy. For stream tables where delta is <1% of target, replace MERGE with DELETE WHERE __pgt_row_id IN (delta_deletes) + INSERT ... SELECT FROM delta_inserts. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind pg_trickle.merge_strategy = 'auto'|'merge'|'delete_insert' GUC. 1–2 wk PLAN_PERFORMANCE_PART_9.md §Phase D

MERGE alternatives subtotal: ~1–2 weeks

Algebraic Aggregate UPDATE Fast-Path (B-1)

In plain terms: The current aggregate delta rule recomputes entire groups where the GROUP BY key appears in the delta. For a group with 100K rows where 1 row changed, the aggregate re-scans all 100K rows in that group. For decomposable aggregates (SUM/COUNT/AVG), a direct UPDATE target SET col = col + Δ replaces the full MERGE join — dropping aggregate refresh from O(group_size) to O(1) per group.

Item Description Effort Ref
B-1 Algebraic aggregate UPDATE fast-path. For GROUP BY queries where all aggregates are algebraically invertible (SUM/COUNT/AVG), replace the MERGE with a direct UPDATE target SET col = col + Δ WHERE group_key = ? for existing groups, plus INSERT for newly-appearing groups and DELETE for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding __pgt_aux_count / __pgt_aux_sum auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (MIN, MAX, STRING_AGG, etc.). Gate behind pg_trickle.aggregate_fast_path GUC (default true). Expected impact: 5–20× apply-time reduction for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time. 4–6 wk plans/performance/PLAN_NEW_STUFF.md §B-1 · plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4

B-1 subtotal: ~4–6 weeks

Append-Only Stream Tables — MERGE Bypass (A-3-AO)

In plain terms: When a stream table’s sources are insert-only (e.g. event logs, append-only tables where CDC never sees DELETE/UPDATE), the MERGE is pure overhead — every delta row is an INSERT, never a match. Bypassing MERGE entirely with a plain INSERT INTO st SELECT ... FROM delta removes the join against the target table, takes only RowExclusiveLock, and is the single highest-payoff optimization for event-sourced architectures.

Item Description Effort Ref
A-3-AO Append-only stream table fast path. Expose an explicit CREATE STREAM TABLE … APPEND ONLY declaration. When set, refresh uses INSERT INTO st SELECT ... FROM delta instead of MERGE — no target-table join, RowExclusiveLock only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas. 1–2 wk plans/performance/PLAN_NEW_STUFF.md §A-3

A-3-AO subtotal: ~1–2 weeks

Delta Predicate Pushdown (B-2)

In plain terms: For a query like SELECT ... FROM orders WHERE status = 'shipped', if a CDC change row has status = 'pending', the delta processes it through scan → filter → discard. All the scan and join work is wasted. Pushing the WHERE predicate down into the change buffer scan eliminates irrelevant rows before any join processing begins — a 5–10× reduction in delta row volume for selective queries.

Item Description Effort Ref
B-2 Delta predicate pushdown. During OpTree construction, identify Filter nodes whose predicates reference only columns from a single source table. Inject these predicates into the delta_scan CTE as additional WHERE clauses (including OR old_col = 'value' for DELETE correctness). Expected impact: 5–10× delta row reduction for queries with < 10% selectivity. 2–3 wk plans/performance/PLAN_NEW_STUFF.md §B-2

B-2 subtotal: ~2–3 weeks

Shared-Memory Template Caching (G14-SHC)

In plain terms: Every new database connection that triggers a refresh pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With PgBouncer in transaction mode, this happens on every single refresh cycle. Shared-memory caching stores compiled templates in PostgreSQL DSM so they survive across connections — eliminating the cold-start entirely for steady-state workloads.

Item Description Effort Ref
G14-SHC Shared-memory template caching (implementation). Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0. ~2–3wk plans/performance/REPORT_OVERALL_STATUS.md §14

G14-SHC subtotal: ~2–3 weeks

PostgreSQL 19 Forward-Compatibility (A3) — Moved to v1.0.0

PG 19 beta not available in time. Items A3-1 through A3-4 deferred to v1.0.0 milestone.

Change Buffer Compaction (C-4)

In plain terms: A high-churn source table can accumulate thousands of changes to the same row between refresh cycles — an INSERT followed by 10 UPDATEs followed by a DELETE is really just “nothing happened.” Compaction merges multiple changes to the same row ID into a single net change before the delta query runs, reducing change buffer size by 50–90% for high-churn tables. This directly reduces work for every downstream path (MERGE, DELETE+INSERT, append-only INSERT, predicate pushdown).

Item Description Effort Ref
C-4 Change buffer compaction. Before delta-query execution, merge multiple changes to the same __pgt_row_id into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding pg_trickle.compact_threshold rows (default: 100K). Expected impact: 50–90% reduction in change buffer size for high-churn tables. 2–3 wk plans/performance/PLAN_NEW_STUFF.md §C-4

C-4 subtotal: ~2–3 weeks

Test Coverage Hardening (TG2)

In plain terms: The performance optimizations in this release change core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown). Before and alongside these changes, critical test coverage gaps need closing — particularly around operators and scenarios where bugs could hide silently. These gaps were identified in the TESTING_GAPS_2 audit.

High-Priority Gaps

Item Description Effort Ref
TG2-WIN Window function DVM execution tests. ~5 unit tests exist but 0 DVM execution tests. Add execution-level tests for ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD delta behavior across INSERT/UPDATE/DELETE cycles. 3–5d TESTING_GAPS_2.md
TG2-JOIN Join multi-cycle UPDATE/DELETE correctness. E2E join tests are INSERT-only; no UPDATE/DELETE differential cycles. Add systematic multi-cycle coverage for INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation. Risk: silent data corruption in production workloads. 3–5d TESTING_GAPS_2.md
TG2-EQUIV Differential ≡ Full equivalence validation. Only CTEs validated; joins and aggregates lack equivalence proof. Add a test harness that runs every defining query in both DIFFERENTIAL and FULL mode and asserts identical results. Critical for trusting the new optimization paths. 3–5d TESTING_GAPS_2.md

Medium-Priority Gaps

Item Description Effort Ref
TG2-MERGE refresh.rs MERGE template unit tests. Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level. 2–3d TESTING_GAPS_2.md
TG2-CANCEL Timeout/cancellation during refresh. Zero tests for statement_timeout, pg_cancel_backend() during active refresh. Risk: silent failures or resource leaks under production load. 1–2d TESTING_GAPS_2.md
TG2-SCHEMA Source table schema evolution. Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change. 2–3d TESTING_GAPS_2.md

TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)

Performance Regression CI (BENCH-CI)

In plain terms: v0.16.0 changes core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown, buffer compaction). Without automated benchmarks in CI, performance regressions will slip through silently. This adds a benchmark suite that runs on every PR and compares against a committed baseline — any statistically significant regression blocks the merge.

Item Description Effort Ref
BENCH-CI-1 Benchmark harness in CI. Run just bench (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion’s --save-baseline / --baseline. 2–3d plans/performance/PLAN_PERFORMANCE_PART_9.md §I
BENCH-CI-2 Regression gate. Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers. 1–2d plans/performance/PLAN_PERFORMANCE_PART_9.md §I
BENCH-CI-3 Scenario coverage. Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales. 2–3d plans/performance/PLAN_PERFORMANCE_PART_9.md §I

BENCH-CI subtotal: ~1–2 weeks

Auto-Indexing on Stream Table Creation (AUTO-IDX)

In plain terms: pg_ivm automatically creates indexes on GROUP BY columns and primary key columns when creating an incrementally maintained view. pg_trickle currently requires manual index creation, which is a friction point for new users. Auto-indexing creates appropriate indexes at stream table creation time — GROUP BY keys, DISTINCT columns, and the __pgt_row_id covering index for MERGE performance.

Item Description Effort Ref
AUTO-IDX-1 Auto-create indexes on GROUP BY / DISTINCT columns. ✅ GROUP BY composite index (existing) and DISTINCT composite index (new) auto-created at create_stream_table() time. Gated behind pg_trickle.auto_index GUC. src/api.rs
AUTO-IDX-2 Covering index on __pgt_row_id. ✅ Already implemented (A-4). Now gated behind pg_trickle.auto_index GUC (default true). src/api.rs

AUTO-IDX: ✅ Done

Quick Wins

Item Description Effort Ref
C2-BUG Implement missing resume_stream_table(). ✅ Already existed since v0.2.0 — verified operational.
ERR-REF Error reference documentation. ✅ Published as docs/ERRORS.md with all 20 variants documented. Cross-linked from FAQ. docs/ERRORS.md
GUC-DEFAULTS Review dangerous GUC defaults. ✅ Defaults kept at true (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md. docs/CONFIGURATION.md
BUF-LIMIT Change buffer hard growth limit.pg_trickle.max_buffer_rows GUC added (default: 1M). Forces FULL refresh + truncation when exceeded. src/config.rs · src/refresh.rs

Quick wins: ✅ Done

v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins) Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.

Exit criteria: - [x] PH-D1: DELETE+INSERT strategy implemented and gated behind merge_strategy GUC; correctness verified for INSERT/UPDATE/DELETE deltas - [x] B-1: Algebraic aggregate fast-path replaces MERGE for SUM/COUNT/AVG GROUP BY queries; aggregate_fast_path GUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates; explain_st() exposes aggregate_path; existing tests pass — ✅ Done in v0.16.0 Phase 8 - [x] A-3-AO: CREATE STREAM TABLE … APPEND ONLY accepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event - [x] B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction - [x] G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with template_cache GUC; invalidation on DDL; explain_st() exposes stats - A3: PG 19 builds and passes full E2E suite — moved to v1.0.0 - [x] C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads; compact_threshold GUC respected; no correctness regressions - [x] TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE - [x] TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss - [x] TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions - [x] TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0) - [x] TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0) - [x] TG2-SCHEMA: Source table type changes and column renames tested end-to-end - [x] BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin - [x] AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns; __pgt_row_id covering index for ≤ 8-column tables; auto_index GUC respected - [x] C2-BUG: resume_stream_table() verified operational (present since v0.2.0) - [x] ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes - [x] GUC-DEFAULTS: planner_aggressive and cleanup_use_truncate defaults reviewed; trade-offs documented in CONFIGURATION.md - [x] BUF-LIMIT: max_buffer_rows GUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded - [x] Extension upgrade path tested (0.15.0 → 0.16.0) - [x] just check-version-sync passes