Contents
Plain-language companion: v0.22.0.md
v0.22.0 — Production Scalability & Downstream Integration
Status: ✅ Released. Driven by PLAN_OVERALL_ASSESSMENT.md P1 items not addressed in v0.21.0 and the highest-value P2 items.
Release Theme This release delivers the two highest-impact items from the overall assessment deferred from v0.21.0: a minimal-viable in-database parallel refresh worker pool (the single largest scalability unlock) and a downstream CDC publication so stream table changes can drive Kafka, Debezium, and event-sourcing pipelines without a second replication slot. Two P2 items ship alongside: a predictive cost model for adaptive refresh and SLA-driven tier auto-assignment. The transactional outbox helper moves to v0.24.0 where it ships alongside a companion inbox helper as a complete transactional messaging solution.
Downstream CDC Publication (P1 — §9.2)
In plain terms: pg_trickle consumes CDC from source tables but cannot emit changes downstream. This adds
stream_table_to_publication()— a helper that exposes every row applied to a stream table as a PostgreSQL logical replication publication so Kafka Connect, Debezium, and event-sourcing pipelines can subscribe with zero code and no second replication slot.
| Item | Description | Effort | Ref |
|---|---|---|---|
| CDC-PUB-1 | stream_table_to_publication(name TEXT) SQL function. Creates a logical replication publication for the target stream table using pgt_inserted_rows/pgt_deleted_rows output from the MERGE step. Catalog column downstream_publication_name tracks the association. |
2–3d | PLAN_OVERALL_ASSESSMENT.md §9.2 |
| CDC-PUB-2 | Lifecycle management. drop_stream_table_publication(name), auto-drop on drop_stream_table(), recreation on schema-change rebuild. |
1d | PLAN_OVERALL_ASSESSMENT.md §9.2 |
| CDC-PUB-3 | pg_stat_stream_tables — downstream_publication column. Surface publication name (or NULL) in the monitoring view. |
0.5d | PLAN_OVERALL_ASSESSMENT.md §9.2 |
| CDC-PUB-4 | E2E tests. Create publication; verify subscriber receives insert/update/delete events; drop and verify cleanup. | 1d | PLAN_OVERALL_ASSESSMENT.md §9.2 |
| CDC-PUB-5 | Documentation. docs/SQL_REFERENCE.md section on downstream publications; tutorial showing Kafka Connect integration pattern. |
1d | — |
Downstream CDC publication subtotal: ~1–1.5 weeks
In-Database Parallel Refresh Worker Pool — Minimal Viable Slice (P1 — §3.1)
In plain terms: The scheduler today runs one refresh at a time per tick. This installs a dynamic bgworker pool — a coordinator owns the DAG, workers execute refreshes — so independent stream tables at the same DAG level refresh simultaneously. Deployments with 200+ STs or long refresh queues get immediate throughput gains. Opt-in via
max_parallel_workers; default 0 preserves existing serial behaviour.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PAR-1 | Coordinator / worker process split. Coordinator BGW manages the tick; dispatches ready-to-run STs to a PgLwLock-protected shared work queue; worker BGWs pop entries and execute refresh transactions. |
1.5–2wk | plans/sql/PLAN_PARALLELISM.md §3 |
| PAR-2 | pg_trickle.max_parallel_workers GUC (default 0 = serial, range 0–32). Gate the entire parallel path so deployments can opt in incrementally. |
1d | plans/sql/PLAN_PARALLELISM.md §4 |
| PAR-3 | DAG level extraction. Re-use topological_levels() already in dag.rs to identify STs that can run concurrently (same level, no intra-level edges). |
0.5d | plans/sql/PLAN_PARALLELISM.md §3 |
| PAR-4 | Worker crash recovery. Coordinator marks the ST ERROR in pgt_refresh_history on worker crash (same behaviour as serial crash); respawns the worker slot. |
1d | plans/sql/PLAN_PARALLELISM.md §5 |
| PAR-5 | E2E tests: correctness + throughput. Diamond DAG with concurrent same-level refreshes; verify no partial-consistency window. Benchmark: wall-clock tick latency vs serial at 50-ST scale. | 1d | plans/sql/PLAN_PARALLELISM.md §6 |
Parallel refresh subtotal: ~3–4 weeks
Predictive Refresh Cost Model (P2 — §9.3)
In plain terms: The current adaptive threshold reacts after a slow differential refresh. This extends self-monitoring to predict
duration_msfromrows_inserted + rows_deletedvia linear regression over the last hour. When the forecast exceedslast_full_ms × 1.5, pg_trickle switches to FULL pre-emptively — eliminating the one-bad-cycle latency spike entirely.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PRED-1 | Linear regression forecaster. Fit duration_ms ~ delta_rows over pg_trickle.prediction_window minutes of pgt_refresh_history per ST. Expose fitted slope and intercept as columns in df_threshold_advice. |
1–2d | PLAN_OVERALL_ASSESSMENT.md §9.3 |
| PRED-2 | Pre-emptive FULL switch. If predicted_diff_ms > last_full_ms × pg_trickle.prediction_ratio (default 1.5), override strategy to FULL; log refresh_reason = 'predicted_cost_exceeds_full'. |
1d | PLAN_OVERALL_ASSESSMENT.md §9.3 |
| PRED-3 | Cold-start fallback. When fewer than pg_trickle.prediction_min_samples (default 5) history rows exist, fall back to the existing fixed-threshold logic. |
0.5d | — |
| PRED-4 | E2E test + proptest. Verify pre-emptive switch fires under synthetic cost spike; proptest checks cold-start fallback boundary (0–4 samples). | 1d | — |
Predictive cost model subtotal: ~1 week
SLA-Driven Tier Auto-Assignment (P2 — §9.7)
In plain terms:
alter_stream_table(name, sla => interval '30 seconds')lets the scheduler pick the right tier automatically — no manual tier tuning required. Removes the expert-knowledge barrier to tiered scheduling.
| Item | Description | Effort | Ref |
|---|---|---|---|
| SLA-1 | sla parameter on create_stream_table / alter_stream_table. Accepts an INTERVAL; stored as freshness_deadline_ms in pgt_stream_tables. |
0.5d | PLAN_OVERALL_ASSESSMENT.md §9.7 |
| SLA-2 | Initial tier assignment. On creation or alter_stream_table with sla set, assign to the tier whose dispatch_gap ≤ sla, considering current queue depth. |
1d | PLAN_OVERALL_ASSESSMENT.md §9.7 |
| SLA-3 | Dynamic re-assignment. After each tick, check whether the ST’s tier still meets the SLA given measured queue depth; bump one tier up or down if the gap is consistently exceeded or under-utilised by >2×. | 1d | PLAN_OVERALL_ASSESSMENT.md §9.7 |
| SLA-4 | E2E test. Create ST with 30 s SLA; inject artificial tick delay; verify tier promotion within 3 cycles. | 0.5d | — |
SLA-driven tier subtotal: ~3–4 days
Implementation Phases
| Phase | Description | Duration |
|---|---|---|
| CDC-PUB | Downstream CDC publication: SQL function, lifecycle, monitoring, tests, docs | Days 1–8 |
| PAR | Parallel refresh: coordinator/worker split, GUC, DAG levels, recovery, tests | Days 9–28 |
| PRED | Predictive cost model: regression, pre-emptive switch, cold-start fallback, tests | Days 29–33 |
| SLA | SLA-driven tier: sla param, initial assignment, dynamic re-assignment, tests |
Days 34–37 |
v0.22.0 total: ~5 weeks (downstream CDC + parallel refresh + predictive cost + SLA tier)
Exit criteria:
- [x] CDC-PUB-1: stream_table_to_publication(name) creates a working logical publication
- [x] CDC-PUB-2: Publication is dropped automatically when the stream table is dropped
- [x] CDC-PUB-3: downstream_publication column visible in pg_stat_stream_tables
- [x] CDC-PUB-4: Subscriber receives correct insert/update/delete events; E2E test passes
- [x] PAR-2: max_parallel_workers = 0 (default) produces identical results to serial mode
- [x] PAR-1/PAR-3: max_parallel_workers ≥ 1 dispatches independent same-level STs concurrently
- [x] PAR-4: Worker crash marks ST ERROR; coordinator respawns worker slot
- [x] PAR-5: Diamond DAG concurrent correctness test passes; throughput improvement benchmarked
- [x] PRED-1: Fitted coefficients visible in df_threshold_advice
- [x] PRED-2: Pre-emptive FULL switch fires under synthetic spike; refresh_reason = 'predicted_cost_exceeds_full' logged
- [x] PRED-3: Cold-start fallback active when fewer than prediction_min_samples history rows exist
- [x] SLA-1: create_stream_table(..., sla => '30 seconds') stores freshness_deadline_ms
- [x] SLA-2: Initial tier assignment matches SLA requirement on creation
- [x] SLA-3: Tier auto-adjusts within 3 cycles when queue depth breaches SLA
- [x] Extension upgrade path tested (0.21.0 → 0.22.0)
- [x] just check-version-sync passes