Contents
-
- v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling
- Scalability Foundations (Phase 5)
- Partitioning Enhancements (Phase 6)
- MERGE Profiling (Phase 7)
- dbt Macro Updates (Phase 8)
- Multi-Tenant Scheduler Isolation (Phase 9)
- TPC-H Benchmark Harness (Phase 10)
- SQL Coverage Cleanup (Phase 11)
- DVM Engine Improvements (Phase 10)
- Regression-Free Testing Initiative (Q2 2026)
- v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling
Plain-language companion: v0.13.0.md
v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling
Status: Released (2026-03-31).
Goal: Deliver the scalability foundations deferred from v0.12.0 — columnar change tracking and shared change buffers — alongside the partitioning enhancements that build on v0.11.0’s RANGE partitioning spike, a MERGE deduplication profiling pass, the dbt macro updates, per-database worker quotas for multi-tenant deployments, the TPC-H-derived benchmarking harness for data-driven performance validation, and a small SQL coverage cleanup for PG 16+ expression types.
Phases: 5 (Scalability), 6 (Partitioning), 7 (MERGE Profiling), 8 (dbt Macro Updates), 9 (Multi-Tenant Scheduler Isolation), 10 (TPC-H Benchmark Harness), 11 (SQL Coverage Cleanup).
Scalability Foundations (Phase 5)
In plain terms: These items directly serve the project’s primary goal of world-class performance and scalability. Columnar change tracking eliminates wasted delta processing for wide tables, and shared change buffers reduce I/O multiplication in deployments with many stream tables reading from the same source.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A-2 | Columnar Change Tracking. Per-column bitmask in CDC triggers; skip rows where no referenced column changed; lightweight UPDATE-only path when only projected columns changed; 50–90% delta-volume reduction for wide-table UPDATE workloads. | 3–4 wk | PLAN_NEW_STUFF.md §A-2 | ✅ Done |
| D-4 | Shared Change Buffers. Single buffer per source shared across all dependent STs; multi-frontier cleanup coordination; static-superset column mode for initial implementation. | 3–4 wk | PLAN_NEW_STUFF.md §D-4 | ✅ Done |
| |
buffer_partitioning for high-throughput sources.compact_threshold in a single refresh cycle is converted to RANGE(lsn) partitioned mode at runtime. |
— | REPORT_OVERALL_STATUS.md §R7 |
⚠️ D-4 multi-frontier cleanup correctness verified.
MIN(consumer_frontier)used in all cleanup paths. Property-based tests with 5–10 consumers and 500 random frontier advancement cases pass.Scalability foundations subtotal: ~6–8 weeks
Partitioning Enhancements (Phase 6)
In plain terms: The v0.11.0 spike delivered RANGE partitioning end-to-end. These follow-on items extend coverage to the use cases deliberately deferred from A1: multi-column keys, retrofitting existing stream tables, LIST-based partitions, HASH partitions, and operational quality-of-life improvements.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
partition_by; ROW() predicate for composite keys. |
— | src/api.rs, src/refresh.rs |
| |
alter_stream_table(partition_by => …) support. Add/change/remove partition key with full storage rebuild. |
— | src/api.rs, src/catalog.rs |
| |
PARTITION BY LIST for low-cardinality columns; IN (…) predicate style from the delta. |
— | src/api.rs, src/refresh.rs |
| |
HASH:col[:N] with auto-created child partitions; per-partition MERGE through satisfies_hash_partition(). |
— | src/api.rs, src/refresh.rs |
| |
warn_default_partition_growth() after FULL and DIFFERENTIAL refresh. |
— | src/refresh.rs |
Partitioning enhancements subtotal: ~5–8 weeks
MERGE Profiling (Phase 7)
| Item | Description | Effort | Ref |
|---|---|---|---|
| G14-MDED | MERGE deduplication profiling. Profile how often concurrent-write scenarios produce duplicate key entries requiring pre-MERGE compaction. If ≥10% of refresh cycles need dedup, write an RFC for a two-pass MERGE strategy. | 3–5d | plans/performance/REPORT_OVERALL_STATUS.md §14 |
| PROF-DLT | Delta SQL query plan profiling (explain_delta() function). Capture EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for auto-generated delta SQL queries to identify PostgreSQL execution bottlenecks (join algorithms, scan types, sort spills). Add pgtrickle.explain_delta(st_name, format DEFAULT 'text') SQL function; optional PGS_PROFILE_DELTA=1 environment variable for E2E test auto-capture to /tmp/delta_plans/<st>.json. Enables identification of operator-level performance issues (semi-join full scans, deep join chains). Prerequisite for data-driven MERGE optimization. |
1–2w | PLAN_TPC_H_BENCHMARKING.md §1-5 |
MERGE profiling subtotal: ~1–3 weeks
dbt Macro Updates (Phase 8)
In plain terms: Expose the v0.11.0 SQL API additions (
partition_by,fuse,fuse_ceiling,fuse_sensitivity) in the dbt materialization macros so dbt users can configure them viaconfig(...). No catalog changes; pure Jinja/SQL.
| Item | Description | Effort |
|---|---|---|
| DBT-1 | partition_by config option wired through stream_table.sql, create_stream_table.sql, and alter_stream_table.sql |
~1d |
| DBT-2 | fuse, fuse_ceiling, fuse_sensitivity config options wired through the materialization and alter macro with change-detection logic |
~1–2d |
| DBT-3 | dbt docs update: README and SQL_REFERENCE.md dbt section | ~0.5d |
dbt macro updates subtotal: ~2–3.5 days
Multi-Tenant Scheduler Isolation (Phase 9)
In plain terms: As deployments grow past 10 databases on a single cluster, all schedulers compete for the same global background-worker pool. One busy database can starve the others. Phase 9 gives operators per-database quotas and a priority queue so critical databases always get workers.
| Item | Description | Effort | Ref |
|---|---|---|---|
| |
pg_trickle.per_database_worker_quota GUC; priority ordering: IMMEDIATE > Hot > Warm > Cold STs; burst capacity up to 150% when other databases are under quota.compute_per_db_quota() with 80% burst; tier-aware sort_ready_queue_by_priority; 5 unit tests + 6 E2E tests |
— | src/scheduler.rs |
⚠️ C-3 depends on C-1 (tiered scheduling) for Hot/Warm/Cold classification. If C-1 is not ready, fall back to IMMEDIATE > all-other ordering with equal priority within each tier; add full tier-aware ordering as a follow-on when C-1 lands in v0.14.0.
Multi-tenant scheduler isolation subtotal: ~2–3 weeks
TPC-H Benchmark Harness (Phase 10)
In plain terms: The existing TPC-H correctness suite (22/22 queries passing) has no timing infrastructure. Phase 10 adds benchmark mode so we can measure FULL vs DIFFERENTIAL speedups across all 22 queries — the only way to validate that A-2, D-4, and other v0.13.0 changes actually help on realistic analytical workloads, and to catch per-query regressions at larger scale factors.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TPCH-1 | TPCH_BENCH=1 benchmark mode for Phase 3. Instrument test_tpch_full_vs_differential with warm-up cycles (WARMUP_CYCLES=2), reuse extract_last_profile() for [PGS_PROFILE] extraction, emit [TPCH_BENCH] structured output per cycle (query=q01 tier=2 cycle=1 mode=DIFF ms=12.7 decision=0.41 merge=11.3 …). Add print_tpch_summary() with per-query FULL/DIFF median, speedup, P95, and MERGE% table. |
4–5h | PLAN_TPC_H_BENCHMARKING.md §3 | ✅ Done |
| TPCH-2 | just bench-tpch / bench-tpch-large / bench-tpch-fast justfile targets. bench-tpch: SF-0.01 with TPCH_BENCH=1; bench-tpch-large: SF-0.1 with 5 cycles; bench-tpch-fast: skip Docker image rebuild. Enables before/after measurement for every v0.13.0 optimization. |
15 min | PLAN_TPC_H_BENCHMARKING.md §3 | ✅ Done |
| TPCH-3 | TPC-H OpTree Criterion micro-benchmarks. Add composite OpTree benchmarks to benches/diff_operators.rs representing TPC-H query shapes (diff_tpch_q01, diff_tpch_q05, diff_tpch_q08, diff_tpch_q18, diff_tpch_q21). Measures pure-Rust delta SQL generation time for complex multi-join/semi-join trees; catches DVM engine regressions without a running database. |
4h | PLAN_TPC_H_BENCHMARKING.md §4 | ✅ Done |
TPC-H benchmark harness subtotal: ~1 day
SQL Coverage Cleanup (Phase 11)
In plain terms: Three small SQL expression gaps that are unscheduled anywhere. Two are PG 16+ standard SQL syntax currently rejected with errors; one is an audit-gated correctness check for recursive CTEs with non-monotone operators. All are low-effort items that round out DVM coverage without adding scope risk.
| Item | Description | Effort | Ref |
|---|---|---|---|
| SQL-RECUR | Recursive CTE non-monotone divergence audit. Write an E2E test for a recursive CTE with EXCEPT or aggregation in the recursive term (WITH RECURSIVE … SELECT … EXCEPT SELECT …). If the test passes → downgrade G1.3 to P4 (verified correct, no code change). If it fails → add a guard in diff_recursive_cte that detects non-monotone recursive terms and rejects them with ERROR: non-monotone recursive CTEs are not supported in DIFFERENTIAL mode — use FULL. |
6–8h | GAP_SQL_PHASE_7.md §G1.3 |
| SQL-PG16-1 | IS JSON predicate support (PG 16+). expr IS JSON, expr IS JSON OBJECT, expr IS JSON ARRAY, expr IS JSON SCALAR, expr IS JSON WITH UNIQUE KEYS — standard SQL/JSON predicates rejected today. Add a T_JsonIsPredicate arm in parser.rs; the predicate is treated opaquely (no delta decomposition); it passes through to the delta SQL unchanged where the PG executor evaluates it natively. |
2–3h | GAP_SQL_PHASE_6.md §G1.4 |
| SQL-PG16-2 | SQL/JSON constructor support (PG 16+). JSON_OBJECT(…), JSON_ARRAY(…), JSON_OBJECTAGG(…), JSON_ARRAYAGG(…) — standard SQL/JSON constructors (T_JsonConstructorExpr) currently rejected. Add opaque pass-through in parser.rs; treat as scalar expressions (no incremental maintenance of the JSON value itself); handle the aggregate variants the same way as other custom aggregates (full group rescan). |
4–6h | GAP_SQL_PHASE_6.md §G1.5 |
SQL coverage cleanup subtotal: ~1–2 days
DVM Engine Improvements (Phase 10)
In plain terms: The delta SQL generated for deep multi-table joins (e.g., TPC-H Q05/Q09 with 6 joined tables) computes identical pre-change snapshots redundantly at every reference site, spilling multi-GB temporary files that exceed
temp_file_limit. Nested semi-joins (Q20) exhibit an O(n²) blowup from fully materializing the right-side pre-change state. These improvements target the intermediate data volume directly in the delta SQL generator, with TPC-H 22/22 DIFFERENTIAL correctness as the measurable gate.
| Item | Description | Effort | Ref |
|---|---|---|---|
| DI-1 | Named CTE L₀ snapshots. Emit per-leaf pre-change snapshots as named CTEs (NOT MATERIALIZED default; MATERIALIZED when reference count ≥ 3); deduplicate 3–10× redundant EXCEPT ALL evaluations per leaf. Targets Q05/Q09 temp spill root cause. |
2–3d | PLAN_DVM_IMPROVEMENTS.md §DI-1 |
| DI-2 | Pre-image read from change buffer + aggregate UPDATE-split. Replace per-leaf EXCEPT ALL with a NOT EXISTS anti-join on pk_hash + direct old_* read. Per-leaf conditional fallback to EXCEPT ALL when delta exceeds max_delta_fraction for that leaf. Includes aggregate UPDATE-split: the ’D' side of SUM(CASE WHEN …) evaluates using old_* column values, superseding DI-8’s band-aid. |
3.5–5.5d | PLAN_DVM_IMPROVEMENTS.md §DI-2 |
| DI-3 | Group-key filtered aggregate old rescan. Restrict non-algebraic aggregate EXCEPT ALL rescans to affected groups via EXISTS (… IS NOT DISTINCT FROM …) filter. NULL-safe. Independent quick win. |
0.5–1d | PLAN_DVM_IMPROVEMENTS.md §DI-3 |
| DI-6 | Lazy semi-join R_old materialization. Skip EXCEPT ALL for unchanged semi-join right children; push down equi-join key as a filter when R_old is needed. Eliminates Q20-type O(n²) blowup. |
1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-6 |
| DI-4 | Shared R₀ CTE cache. Cache pre-change snapshot SQL by OpTree node identity to avoid regenerating duplicate inline subqueries for shared subtrees. Depends on DI-1. | 1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-4 |
| DI-5 | Part 3 correction consolidation. Consolidate per-node Part 3 correction CTEs for linear inner-join chains into a single term. | 2–3d | PLAN_DVM_IMPROVEMENTS.md §DI-5 |
| DI-7 | Scan-count-aware strategy selector. max_differential_joins and max_delta_fraction per-stream-table options; auto-fallback to FULL refresh when join count or delta-rate threshold is exceeded. Complements DI-2’s per-leaf fallback with a coarser per-ST guard at scheduler decision time. |
1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-7 |
| DI-8 | SUM(CASE WHEN …) algebraic drift fix. Detect Expr::Raw("CASE …") in is_algebraically_invertible() and fall back to GROUP_RESCAN. Q14 is unaffected (parsed as ComplexExpression, already GROUP_RESCAN). Correctness band-aid superseded by DI-2’s aggregate UPDATE-split. |
~0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-8 |
| DI-9 | Scheduler skips IMMEDIATE-mode tables. Raise scheduler_interval_ms GUC cap to 600,000 ms; return early from refresh-due check for refresh_mode = IMMEDIATE (verified safe: IMMEDIATE drains TABLE-source buffers synchronously; downstream CALCULATED tables detected via has_stream_table_source_changes() independently). |
0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-9 |
| DI-10 | SF=1 benchmark validation gate. Add bench-tpch-sf1 justfile target (TPCH_SF=1 TPCH_BENCH=1). Gate v0.13.0 release on 22/22 queries at SF=1. CI: manual dispatch only (60–180 min runtime, 4h timeout). |
~0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-10 |
| DI-11 | Predicate pushdown + deep-join L₀ threshold + planner hints. (a) Enable push_filter_into_cross_joins() with scalar-subquery guard. (b) Deep-join L₀ threshold (4+ scans): skip L₀ reconstruction, use L₁ + Part 3 correction. © Deep-join planner hints (5+ scans): disable nestloop, raise work_mem, override temp_file_limit. Result: 22/22 TPC-H DIFFERENTIAL. |
~1d | — |
DI-2 promoted from v1.x: CDC
old_*column capture was completed as part of the typed-column CDC rewrite (already in production). DI-2 scope includes both the join-level pre-image capture (NOT EXISTSanti-join) and an aggregate UPDATE-split that usesold_*values for the ’D' side of SUM(CASE WHEN …), superseding DI-8’s GROUP_RESCAN band-aid.Implementation order: DI-8 → DI-9 → DI-1 → DI-3 → DI-2 → DI-6 → DI-4 → DI-5 → DI-7 → DI-10 → DI-11
DVM improvements subtotal: ~2–3 weeks (DI-8/DI-9 are small independent fixes; DI-1–DI-7 are the core engine work; DI-10 is a validation run; DI-11 is predicate pushdown + deep-join optimization)
Regression-Free Testing Initiative (Q2 2026)
Addresses 9 structural weaknesses identified in the regression risk analysis. Target: reduce regression escape rate from ~15% to <5%.
| Phase | Item | Status |
|---|---|---|
| P1 | Test infrastructure hardening: #[must_use] on poll helpers; wait_for_condition with exponential backoff; assert_column_types_match |
✅ Done (2026-03-28) |
| P2 | Join multi-cycle correctness: 7 tests — LEFT/RIGHT/FULL join, join-key update, both-sides DML, 4-table chain, NULL key | ✅ Done (2026-03-28) |
| P3 | Differential ≡ Full equivalence: 11 tests covering every major DVM operator class; effective_refresh_mode guard |
✅ Done (2026-03-28) |
| P4 | DVM operator execution: LATERAL MAX subquery multi-cycle (5 cycles) + recursive CTE org hierarchy multi-cycle (5 cycles) | ✅ Done (2026-03-28) |
| P5 | Failure recovery & schema evolution: 6 failure recovery tests (FR-1..6 in e2e_failure_recovery_tests.rs) + 5 schema evolution tests (SE-1..5 in e2e_ddl_event_tests.rs) |
✅ Done (2026-03-28) |
| P6 | MERGE template unit tests: 8 pure-Rust tests — determine_refresh_action (×5) + build_is_distinct_clause boundary (×3) in src/refresh.rs |
✅ Done (2026-03-28) |
v0.13.0 total: ~15–23 weeks (Scalability: 6–8w, Partitioning: 5–8w, MERGE Profiling: 1–3w, dbt: 2–3.5d, Multi-tenant: 2–3w, TPC-H harness: ~1d, SQL cleanup: ~1–2d, DVM improvements: ~2–3w)
Exit criteria:
- [x] A-2: Columnar change tracking bitmask skips irrelevant rows; key column classification ✅, __pgt_key_changed annotation ✅, P5 value-only fast path ✅, DiffResult.has_key_changed signal propagation ✅, MERGE value-only UPDATE optimization ✅, upgrade script ✅ ✅ Done
- [x] D-4: Shared buffer serves multiple STs via per-source changes_{oid} naming; pgt_change_tracking.tracked_by_pgt_ids reference counting; shared_buffer_stats() observability; property-based test with 5–10 consumers (3 properties, 500 cases) ✅ Done; 5 E2E fan-out tests
- [x] PERF-2: buffer_partitioning = 'auto' activates RANGE(lsn) partitioned mode for high-throughput sources — throughput-based should_promote_inner() heuristic, convert_buffer_to_partitioned() runtime migration, 10 unit tests + 3 E2E tests, docs/CONFIGURATION.md updated ✅ Done
- [x] A1-1b: Multi-column RANGE partition keys work end-to-end; composite ROW() predicate triggers partition pruning; 3 E2E tests + 5 unit tests ✅ Done
- [x] A1-1c: alter_stream_table(partition_by => …) repartitions existing storage table without data loss; add/change/remove tested
- [x] A1-1d: LIST partitioning creates PARTITION BY LIST storage; IN-list predicate injected; single-column-only validated; 4 E2E tests pass
- [x] A1-3b: HASH partitioning uses per-partition MERGE loop; auto-creates N child partitions; satisfies_hash_partition() filter; 22 unit tests + 6 E2E tests ✅ Done
- [x] PART-WARN: WARNING emitted when default partition has rows after refresh; warn_default_partition_growth() on both FULL and DIFFERENTIAL paths ✅ Done
- [x] G14-MDED: Deduplication frequency profiling complete; TOTAL_DIFF_REFRESHES + DEDUP_NEEDED_REFRESHES shared-memory atomic counters; pgtrickle.dedup_stats() reports ratio; RFC threshold documented at ≥10% ✅ Done
- [x] PROF-DLT: pgtrickle.explain_delta(st_name, format) function captures delta query plans in text/json/xml/yaml; PGS_PROFILE_DELTA=1 auto-capture to /tmp/delta_plans/; documented in SQL_REFERENCE.md ✅ Done
- [x] C-3: Per-database worker quota enforced; tier-aware priority sort (IMMEDIATE > Hot > Warm > Cold) implemented; GUC + E2E quota tests added; compute_per_db_quota() with burst at 80% cluster load ✅ Done
- [x] TPCH-½: TPCH_BENCH=1 mode emits [TPCH_BENCH] lines + summary table; just bench-tpch and bench-tpch-large targets functional ✅ Done
- [x] TPCH-3: Five TPC-H OpTree Criterion benchmarks pass and run without a PostgreSQL backend ✅ Done
- [x] DBT-½/3: partition_by, fuse, fuse_ceiling, fuse_sensitivity exposed in dbt macros; change detection wired; integration tests added; README and SQL_REFERENCE.md updated ✅ Done
- [x] SQL-RECUR: Recursive CTE non-monotone audit complete; G1.3 downgraded to P4 — two Tier 3h E2E tests verify recomputation fallback is correct ✅ Done
- [x] SQL-PG16-1: IS JSON predicate accepted in DIFFERENTIAL defining queries; E2E tests in e2e_expression_tests.rs confirm correct delta behaviour ✅ Done
- [x] SQL-PG16-2: JSON_OBJECT, JSON_ARRAY, JSON_OBJECTAGG, JSON_ARRAYAGG accepted in DIFFERENTIAL defining queries; E2E tests in e2e_expression_tests.rs confirm correct delta behaviour ✅ Done
- [x] scripts/check_upgrade_completeness.sh passes (all catalog changes in sql/pg_trickle--0.12.0--0.13.0.sql) ✅ Done — 58 functions, 8 new columns, all covered
- [x] DI-8: is_algebraically_invertible() detects Expr::Raw("CASE …") and returns false for SUM(CASE WHEN …) (Q14 unaffected — ComplexExpression); Q12 removed from DIFFERENTIAL_SKIP_ALLOWLIST; 4 unit tests ✅ Done
- [x] DI-9: scheduler_interval_ms cap raised to 600,000 ms; scheduler skips IMMEDIATE-mode tables in check_schedule(); verified safe for CALCULATED dependants ✅ Done
- [x] DI-1: Named CTE L₀ snapshots implemented (NOT MATERIALIZED default, MATERIALIZED when ref ≥ 3); Q05/Q09 pass DIFFERENTIAL correctness ✅ Done
- [x] DI-2: NOT EXISTS anti-join replaces EXCEPT ALL in build_pre_change_snapshot_sql(); per-leaf conditional EXCEPT ALL fallback when delta > max_delta_fraction; aggregate UPDATE-split blocked on Q12 drift root cause (DI-8 band-aid retained) ✅ Done
- [x] DI-3: Already implemented — non-algebraic aggregate old rescan filtered via EXISTS (… IS NOT DISTINCT FROM …) to affected groups; NULL-safe ✅ Done
- [x] DI-6: Semi-join R_old lazy materialization with key push-down; Q20 DIFF passes at SF=0.01 ✅ Done
- [x] DI-4/5/7: R₀ cache (subset of DI-1), Part 3 threshold raised from 3→5, strategy selector + max_delta_fraction complete ✅ Done
- [x] DI-10: bench-tpch-sf1 target added; 22/22 queries pass at SF=0.01 (3 cycles, zero drift) ✅ Done
- [x] DI-11: Predicate pushdown enabled with scalar-subquery guard; deep-join L₀ threshold (4 scans); deep-join planner hints (5+ total scans); 22/22 TPC-H DIFFERENTIAL ✅ Done
- [x] Extension upgrade path tested (0.12.0 → 0.13.0) ✅ Done