v0.16.0.md-full / PostgreSQL Extension Network

- v0.16.0 — Performance & Refresh Optimization

Plain-language companion: v0.16.0.md

v0.16.0 — Performance & Refresh Optimization

Status: Released (2026-04-06).

Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic aggregates apply pinpoint updates, and a cross-backend template cache eliminates cold-start latency. Automated benchmark regression gating prevents future performance degradation.

Goal: Attack the MERGE bottleneck from multiple angles — alternative merge strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering, change buffer compaction, shared-memory template caching — close critical test coverage gaps to validate these new paths.

MERGE Alternatives & Planner Control (Phase D)

In plain terms: MERGE dominates 70–97% of refresh time. This explores whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE) is faster for specific patterns — particularly for small deltas against large stream tables where the MERGE join is the bottleneck.

Item	Description	Effort	Ref
~~PH-D1~~	DELETE+INSERT strategy. For stream tables where delta is <1% of target, replace MERGE with `DELETE WHERE __pgt_row_id IN (delta_deletes)` + `INSERT ... SELECT FROM delta_inserts`. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind `pg_trickle.merge_strategy = 'auto'\|'merge'\|'delete_insert'` GUC.	~~1–2 wk~~	~~PLAN_PERFORMANCE_PART_9.md §Phase D~~

MERGE alternatives subtotal: ~1–2 weeks

Algebraic Aggregate UPDATE Fast-Path (B-1)

In plain terms: The current aggregate delta rule recomputes entire groups where the GROUP BY key appears in the delta. For a group with 100K rows where 1 row changed, the aggregate re-scans all 100K rows in that group. For decomposable aggregates (SUM/COUNT/AVG), a direct UPDATE target SET col = col + Δ replaces the full MERGE join — dropping aggregate refresh from O(group_size) to O(1) per group.

Item	Description	Effort	Ref
B-1	Algebraic aggregate UPDATE fast-path. For `GROUP BY` queries where all aggregates are algebraically invertible (`SUM`/`COUNT`/`AVG`), replace the MERGE with a direct `UPDATE target SET col = col + Δ WHERE group_key = ?` for existing groups, plus `INSERT` for newly-appearing groups and `DELETE` for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding `__pgt_aux_count` / `__pgt_aux_sum` auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (`MIN`, `MAX`, `STRING_AGG`, etc.). Gate behind `pg_trickle.aggregate_fast_path` GUC (default `true`). Expected impact: 5–20× apply-time reduction for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time.	4–6 wk	plans/performance/PLAN_NEW_STUFF.md §B-1 · plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4

B-1 subtotal: ~4–6 weeks

Append-Only Stream Tables — MERGE Bypass (A-3-AO)

In plain terms: When a stream table’s sources are insert-only (e.g. event logs, append-only tables where CDC never sees DELETE/UPDATE), the MERGE is pure overhead — every delta row is an INSERT, never a match. Bypassing MERGE entirely with a plain INSERT INTO st SELECT ... FROM delta removes the join against the target table, takes only RowExclusiveLock, and is the single highest-payoff optimization for event-sourced architectures.

Item	Description	Effort	Ref
~~A-3-AO~~	Append-only stream table fast path. Expose an explicit `CREATE STREAM TABLE … APPEND ONLY` declaration. When set, refresh uses `INSERT INTO st SELECT ... FROM delta` instead of MERGE — no target-table join, `RowExclusiveLock` only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas.	~~1–2 wk~~	~~plans/performance/PLAN_NEW_STUFF.md §A-3~~

A-3-AO subtotal: ~1–2 weeks

Delta Predicate Pushdown (B-2)

In plain terms: For a query like SELECT ... FROM orders WHERE status = 'shipped', if a CDC change row has status = 'pending', the delta processes it through scan → filter → discard. All the scan and join work is wasted. Pushing the WHERE predicate down into the change buffer scan eliminates irrelevant rows before any join processing begins — a 5–10× reduction in delta row volume for selective queries.

Item	Description	Effort	Ref
~~B-2~~	Delta predicate pushdown. During OpTree construction, identify `Filter` nodes whose predicates reference only columns from a single source table. Inject these predicates into the `delta_scan` CTE as additional WHERE clauses (including `OR old_col = 'value'` for DELETE correctness). Expected impact: 5–10× delta row reduction for queries with < 10% selectivity.	~~2–3 wk~~	~~plans/performance/PLAN_NEW_STUFF.md §B-2~~

B-2 subtotal: ~2–3 weeks

Shared-Memory Template Caching (G14-SHC)

In plain terms: Every new database connection that triggers a refresh pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With PgBouncer in transaction mode, this happens on every single refresh cycle. Shared-memory caching stores compiled templates in PostgreSQL DSM so they survive across connections — eliminating the cold-start entirely for steady-state workloads.

Item	Description	Effort	Ref
G14-SHC	Shared-memory template caching (implementation). Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0.	~2–3wk	plans/performance/REPORT_OVERALL_STATUS.md §14

G14-SHC subtotal: ~2–3 weeks

PostgreSQL 19 Forward-Compatibility (A3) — Moved to v1.0.0

PG 19 beta not available in time. Items A3-1 through A3-4 deferred to v1.0.0 milestone.

Change Buffer Compaction (C-4)

In plain terms: A high-churn source table can accumulate thousands of changes to the same row between refresh cycles — an INSERT followed by 10 UPDATEs followed by a DELETE is really just “nothing happened.” Compaction merges multiple changes to the same row ID into a single net change before the delta query runs, reducing change buffer size by 50–90% for high-churn tables. This directly reduces work for every downstream path (MERGE, DELETE+INSERT, append-only INSERT, predicate pushdown).

Item	Description	Effort	Ref
~~C-4~~	Change buffer compaction. Before delta-query execution, merge multiple changes to the same `__pgt_row_id` into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding `pg_trickle.compact_threshold` rows (default: 100K). Expected impact: 50–90% reduction in change buffer size for high-churn tables.	~~2–3 wk~~	~~plans/performance/PLAN_NEW_STUFF.md §C-4~~

C-4 subtotal: ~2–3 weeks

Test Coverage Hardening (TG2)

In plain terms: The performance optimizations in this release change core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown). Before and alongside these changes, critical test coverage gaps need closing — particularly around operators and scenarios where bugs could hide silently. These gaps were identified in the TESTING_GAPS_2 audit.

High-Priority Gaps

Item	Description	Effort	Ref
~~TG2-WIN~~	Window function DVM execution tests. ~5 unit tests exist but 0 DVM execution tests. Add execution-level tests for ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD delta behavior across INSERT/UPDATE/DELETE cycles.	~~3–5d~~	~~TESTING_GAPS_2.md~~
~~TG2-JOIN~~	Join multi-cycle UPDATE/DELETE correctness. E2E join tests are INSERT-only; no UPDATE/DELETE differential cycles. Add systematic multi-cycle coverage for INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation. Risk: silent data corruption in production workloads.	~~3–5d~~	~~TESTING_GAPS_2.md~~
~~TG2-EQUIV~~	Differential ≡ Full equivalence validation. Only CTEs validated; joins and aggregates lack equivalence proof. Add a test harness that runs every defining query in both DIFFERENTIAL and FULL mode and asserts identical results. Critical for trusting the new optimization paths.	~~3–5d~~	~~TESTING_GAPS_2.md~~

Medium-Priority Gaps

Item	Description	Effort	Ref
TG2-MERGE	refresh.rs MERGE template unit tests. Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level.	2–3d	TESTING_GAPS_2.md
TG2-CANCEL	Timeout/cancellation during refresh. Zero tests for `statement_timeout`, `pg_cancel_backend()` during active refresh. Risk: silent failures or resource leaks under production load.	1–2d	TESTING_GAPS_2.md
TG2-SCHEMA	Source table schema evolution. Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change.	2–3d	TESTING_GAPS_2.md

TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)

Performance Regression CI (BENCH-CI)

In plain terms: v0.16.0 changes core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown, buffer compaction). Without automated benchmarks in CI, performance regressions will slip through silently. This adds a benchmark suite that runs on every PR and compares against a committed baseline — any statistically significant regression blocks the merge.

Item	Description	Effort	Ref
BENCH-CI-1	Benchmark harness in CI. Run `just bench` (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion’s `--save-baseline` / `--baseline`.	2–3d	plans/performance/PLAN_PERFORMANCE_PART_9.md §I
BENCH-CI-2	Regression gate. Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers.	1–2d	plans/performance/PLAN_PERFORMANCE_PART_9.md §I
BENCH-CI-3	Scenario coverage. Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales.	2–3d	plans/performance/PLAN_PERFORMANCE_PART_9.md §I

BENCH-CI subtotal: ~1–2 weeks

Auto-Indexing on Stream Table Creation (AUTO-IDX)

In plain terms: pg_ivm automatically creates indexes on GROUP BY columns and primary key columns when creating an incrementally maintained view. pg_trickle currently requires manual index creation, which is a friction point for new users. Auto-indexing creates appropriate indexes at stream table creation time — GROUP BY keys, DISTINCT columns, and the __pgt_row_id covering index for MERGE performance.

Item	Description	Effort	Ref
~~AUTO-IDX-1~~	~~Auto-create indexes on GROUP BY / DISTINCT columns.~~ ✅ GROUP BY composite index (existing) and DISTINCT composite index (new) auto-created at `create_stream_table()` time. Gated behind `pg_trickle.auto_index` GUC.	—	src/api.rs
~~AUTO-IDX-2~~	~~Covering index on `__pgt_row_id`.~~ ✅ Already implemented (A-4). Now gated behind `pg_trickle.auto_index` GUC (default `true`).	—	src/api.rs

AUTO-IDX: ✅ Done

Quick Wins

Item	Description	Effort	Ref
~~C2-BUG~~	~~Implement missing `resume_stream_table()`.~~ ✅ Already existed since v0.2.0 — verified operational.	—
~~ERR-REF~~	~~Error reference documentation.~~ ✅ Published as `docs/ERRORS.md` with all 20 variants documented. Cross-linked from FAQ.	—	docs/ERRORS.md
~~GUC-DEFAULTS~~	~~Review dangerous GUC defaults.~~ ✅ Defaults kept at `true` (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md.	—	docs/CONFIGURATION.md
~~BUF-LIMIT~~	~~Change buffer hard growth limit.~~ ✅ `pg_trickle.max_buffer_rows` GUC added (default: 1M). Forces FULL refresh + truncation when exceeded.	—	src/config.rs · src/refresh.rs

Quick wins: ✅ Done

v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins) Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.

Exit criteria: - [x] PH-D1: DELETE+INSERT strategy implemented and gated behind merge_strategy GUC; correctness verified for INSERT/UPDATE/DELETE deltas - [x] B-1: Algebraic aggregate fast-path replaces MERGE for SUM/COUNT/AVG GROUP BY queries; aggregate_fast_path GUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates; explain_st() exposes aggregate_path; existing tests pass — ✅ Done in v0.16.0 Phase 8 - [x] A-3-AO: CREATE STREAM TABLE … APPEND ONLY accepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event - [x] B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction - [x] G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with template_cache GUC; invalidation on DDL; explain_st() exposes stats - ~~A3: PG 19 builds and passes full E2E suite~~ — moved to v1.0.0 - [x] C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads; compact_threshold GUC respected; no correctness regressions - [x] TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE - [x] TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss - [x] TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions - [x] TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0) - [x] TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0) - [x] TG2-SCHEMA: Source table type changes and column renames tested end-to-end - [x] BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin - [x] AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns; __pgt_row_id covering index for ≤ 8-column tables; auto_index GUC respected - [x] C2-BUG: resume_stream_table() verified operational (present since v0.2.0) - [x] ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes - [x] GUC-DEFAULTS: planner_aggressive and cleanup_use_truncate defaults reviewed; trade-offs documented in CONFIGURATION.md - [x] BUF-LIMIT: max_buffer_rows GUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded - [x] Extension upgrade path tested (0.15.0 → 0.16.0) - [x] just check-version-sync passes

PGXN

PostgreSQL Extension Network

Contents