v0.23.0 — TPC-H DVM Scaling Performance
Full technical details: v0.23.0.md-full.md
Status: ✅ Released | Scope: Large (~4–5 weeks)
Root-cause investigation and targeted fixes for three differential refresh failure modes discovered at scale — bringing complex analytical queries from thousands of times slower than full refresh down to the expected incremental performance.
What problem does this solve?
Benchmarking the differential engine at a realistic data scale (TPC-H SF=1.0 — roughly 1 GB of data) revealed that 18 of 22 TPC-H queries had DIFFERENTIAL refresh slower than FULL recomputation — in the worst case, 2,246× slower. This is the opposite of the expected behaviour. The root causes were three distinct failure modes, each requiring a different fix.
The Three Failure Modes
1. Threshold Collapse in Multi-Join Queries
Queries joining four or more tables (TPC-H Q05, Q07, Q08, Q09) were producing intermediate query results with O(n) rows — proportional to the total table size — rather than O(Δ) rows proportional to the change batch. The delta SQL was not effectively restricting which rows from unchanged tables needed to be re-scanned.
The fix restructures the aggregate UPDATE handling to split UPDATE rows into DELETE+INSERT pairs at the source, so the join delta never needs to re-scan unchanged base tables. Multi-join differential refresh is now O(Δ).
In plain terms: changing 100 rows in a table with 1 million rows now refreshes by processing 100 rows, not 1 million.
2. Early Collapse in EXISTS Anti-Join Queries
The EXISTS anti-join query (TPC-H Q04 — “orders with at least one late lineitems”) was 140× slower at 10× the data size because the key filter used to restrict the “unchanged” portion of the right-side table was not applying correctly when additional predicates were present.
The fix generates a key filter that restricts the right-side scan to only
the keys that appear in the delta: WHERE l_orderkey IN (SELECT o_orderkey
FROM delta_orders). This turns an O(n) scan into O(Δ).
3. Structural Bug in Doubly-Nested EXISTS
TPC-H Q20 (which uses a doubly-nested correlated EXISTS subquery) was re-materialising an expensive inner subquery for every row in the outer delta. The fix hoists the inner subquery to a named CTE that is computed once and shared, reducing Q20’s refresh time from ~2 seconds to under 50 milliseconds.
Debug and Tuning Tools
New operational tools introduced alongside the performance fixes:
pgtrickle.log_delta_sql = onGUC — logs the generated delta SQL at DEBUG level, enablingEXPLAIN ANALYZEon the generated queries for diagnosispgtrickle.delta_work_memGUC — sets a higherwork_memfor delta SQL execution to avoid hash/sort spills in complex joinspgtrickle.analyze_before_delta = on(default on) — runsANALYZEon the change buffer tables before executing delta SQL, ensuring the query planner has accurate row count estimates
DIFF Output Format Compatibility
The UPDATE-split fix changes the format of differential output for aggregate
stream tables: previously UPDATE events appeared as single rows, now they
appear as DELETE+INSERT pairs. A pgtrickle.diff_output_format GUC
(split or merged) allows users to opt into the new format on their own
schedule — set merged first, migrate application code, then switch to
split.
Result: All 22 TPC-H Queries Under Target
After the three fixes, all 22 TPC-H queries pass differential correctness validation at SF=1.0, and the previously-failing queries (Q04, Q05, Q07, Q08, Q09, Q20, Q22) are all within their performance targets. AUTO mode now correctly routes all 22 queries to DIFFERENTIAL rather than FULL at SF=1.0.
Scope
v0.23.0 is a targeted performance investigation and fix release. The three failure modes were addressed in strict priority order: confirm the hypothesis before coding, apply the minimal fix, validate with regression tests. The result is a differential engine that delivers its theoretical O(Δ) performance on the full TPC-H benchmark suite at production scale.