v0.22.0.md-full / PostgreSQL Extension Network

- v0.22.0 — Production Scalability & Downstream Integration

Plain-language companion: v0.22.0.md

v0.22.0 — Production Scalability & Downstream Integration

Status: ✅ Released. Driven by PLAN_OVERALL_ASSESSMENT.md P1 items not addressed in v0.21.0 and the highest-value P2 items.

Release Theme This release delivers the two highest-impact items from the overall assessment deferred from v0.21.0: a minimal-viable in-database parallel refresh worker pool (the single largest scalability unlock) and a downstream CDC publication so stream table changes can drive Kafka, Debezium, and event-sourcing pipelines without a second replication slot. Two P2 items ship alongside: a predictive cost model for adaptive refresh and SLA-driven tier auto-assignment. The transactional outbox helper moves to v0.24.0 where it ships alongside a companion inbox helper as a complete transactional messaging solution.

Downstream CDC Publication (P1 — §9.2)

In plain terms: pg_trickle consumes CDC from source tables but cannot emit changes downstream. This adds stream_table_to_publication() — a helper that exposes every row applied to a stream table as a PostgreSQL logical replication publication so Kafka Connect, Debezium, and event-sourcing pipelines can subscribe with zero code and no second replication slot.

Item	Description	Effort	Ref
CDC-PUB-1	`stream_table_to_publication(name TEXT)` SQL function. Creates a logical replication publication for the target stream table using `pgt_inserted_rows`/`pgt_deleted_rows` output from the MERGE step. Catalog column `downstream_publication_name` tracks the association.	2–3d	PLAN_OVERALL_ASSESSMENT.md §9.2
CDC-PUB-2	Lifecycle management. `drop_stream_table_publication(name)`, auto-drop on `drop_stream_table()`, recreation on schema-change rebuild.	1d	PLAN_OVERALL_ASSESSMENT.md §9.2
CDC-PUB-3	`pg_stat_stream_tables` — `downstream_publication` column. Surface publication name (or NULL) in the monitoring view.	0.5d	PLAN_OVERALL_ASSESSMENT.md §9.2
CDC-PUB-4	E2E tests. Create publication; verify subscriber receives insert/update/delete events; drop and verify cleanup.	1d	PLAN_OVERALL_ASSESSMENT.md §9.2
CDC-PUB-5	Documentation. `docs/SQL_REFERENCE.md` section on downstream publications; tutorial showing Kafka Connect integration pattern.	1d	—

Downstream CDC publication subtotal: ~1–1.5 weeks

In-Database Parallel Refresh Worker Pool — Minimal Viable Slice (P1 — §3.1)

In plain terms: The scheduler today runs one refresh at a time per tick. This installs a dynamic bgworker pool — a coordinator owns the DAG, workers execute refreshes — so independent stream tables at the same DAG level refresh simultaneously. Deployments with 200+ STs or long refresh queues get immediate throughput gains. Opt-in via max_parallel_workers; default 0 preserves existing serial behaviour.

Item	Description	Effort	Ref
PAR-1	Coordinator / worker process split. Coordinator BGW manages the tick; dispatches ready-to-run STs to a `PgLwLock`-protected shared work queue; worker BGWs pop entries and execute refresh transactions.	1.5–2wk	plans/sql/PLAN_PARALLELISM.md §3
PAR-2	`pg_trickle.max_parallel_workers` GUC (default 0 = serial, range 0–32). Gate the entire parallel path so deployments can opt in incrementally.	1d	plans/sql/PLAN_PARALLELISM.md §4
PAR-3	DAG level extraction. Re-use `topological_levels()` already in `dag.rs` to identify STs that can run concurrently (same level, no intra-level edges).	0.5d	plans/sql/PLAN_PARALLELISM.md §3
PAR-4	Worker crash recovery. Coordinator marks the ST `ERROR` in `pgt_refresh_history` on worker crash (same behaviour as serial crash); respawns the worker slot.	1d	plans/sql/PLAN_PARALLELISM.md §5
PAR-5	E2E tests: correctness + throughput. Diamond DAG with concurrent same-level refreshes; verify no partial-consistency window. Benchmark: wall-clock tick latency vs serial at 50-ST scale.	1d	plans/sql/PLAN_PARALLELISM.md §6

Parallel refresh subtotal: ~3–4 weeks

Predictive Refresh Cost Model (P2 — §9.3)

In plain terms: The current adaptive threshold reacts after a slow differential refresh. This extends self-monitoring to predict duration_ms from rows_inserted + rows_deleted via linear regression over the last hour. When the forecast exceeds last_full_ms × 1.5, pg_trickle switches to FULL pre-emptively — eliminating the one-bad-cycle latency spike entirely.

Item	Description	Effort	Ref
PRED-1	Linear regression forecaster. Fit `duration_ms ~ delta_rows` over `pg_trickle.prediction_window` minutes of `pgt_refresh_history` per ST. Expose fitted slope and intercept as columns in `df_threshold_advice`.	1–2d	PLAN_OVERALL_ASSESSMENT.md §9.3
PRED-2	Pre-emptive FULL switch. If `predicted_diff_ms > last_full_ms × pg_trickle.prediction_ratio` (default 1.5), override strategy to FULL; log `refresh_reason = 'predicted_cost_exceeds_full'`.	1d	PLAN_OVERALL_ASSESSMENT.md §9.3
PRED-3	Cold-start fallback. When fewer than `pg_trickle.prediction_min_samples` (default 5) history rows exist, fall back to the existing fixed-threshold logic.	0.5d	—
PRED-4	E2E test + proptest. Verify pre-emptive switch fires under synthetic cost spike; proptest checks cold-start fallback boundary (0–4 samples).	1d	—

Predictive cost model subtotal: ~1 week

SLA-Driven Tier Auto-Assignment (P2 — §9.7)

In plain terms: alter_stream_table(name, sla => interval '30 seconds') lets the scheduler pick the right tier automatically — no manual tier tuning required. Removes the expert-knowledge barrier to tiered scheduling.

Item	Description	Effort	Ref
SLA-1	`sla` parameter on `create_stream_table` / `alter_stream_table`. Accepts an `INTERVAL`; stored as `freshness_deadline_ms` in `pgt_stream_tables`.	0.5d	PLAN_OVERALL_ASSESSMENT.md §9.7
SLA-2	Initial tier assignment. On creation or `alter_stream_table` with `sla` set, assign to the tier whose `dispatch_gap ≤ sla`, considering current queue depth.	1d	PLAN_OVERALL_ASSESSMENT.md §9.7
SLA-3	Dynamic re-assignment. After each tick, check whether the ST’s tier still meets the SLA given measured queue depth; bump one tier up or down if the gap is consistently exceeded or under-utilised by >2×.	1d	PLAN_OVERALL_ASSESSMENT.md §9.7
SLA-4	E2E test. Create ST with 30 s SLA; inject artificial tick delay; verify tier promotion within 3 cycles.	0.5d	—

SLA-driven tier subtotal: ~3–4 days

Implementation Phases

Phase	Description	Duration
CDC-PUB	Downstream CDC publication: SQL function, lifecycle, monitoring, tests, docs	Days 1–8
PAR	Parallel refresh: coordinator/worker split, GUC, DAG levels, recovery, tests	Days 9–28
PRED	Predictive cost model: regression, pre-emptive switch, cold-start fallback, tests	Days 29–33
SLA	SLA-driven tier: `sla` param, initial assignment, dynamic re-assignment, tests	Days 34–37

v0.22.0 total: ~5 weeks (downstream CDC + parallel refresh + predictive cost + SLA tier)

Exit criteria: - [x] CDC-PUB-1: stream_table_to_publication(name) creates a working logical publication - [x] CDC-PUB-2: Publication is dropped automatically when the stream table is dropped - [x] CDC-PUB-3: downstream_publication column visible in pg_stat_stream_tables - [x] CDC-PUB-4: Subscriber receives correct insert/update/delete events; E2E test passes - [x] PAR-2: max_parallel_workers = 0 (default) produces identical results to serial mode - [x] PAR-1/PAR-3: max_parallel_workers ≥ 1 dispatches independent same-level STs concurrently - [x] PAR-4: Worker crash marks ST ERROR; coordinator respawns worker slot - [x] PAR-5: Diamond DAG concurrent correctness test passes; throughput improvement benchmarked - [x] PRED-1: Fitted coefficients visible in df_threshold_advice - [x] PRED-2: Pre-emptive FULL switch fires under synthetic spike; refresh_reason = 'predicted_cost_exceeds_full' logged - [x] PRED-3: Cold-start fallback active when fewer than prediction_min_samples history rows exist - [x] SLA-1: create_stream_table(..., sla => '30 seconds') stores freshness_deadline_ms - [x] SLA-2: Initial tier assignment matches SLA requirement on creation - [x] SLA-3: Tier auto-adjusts within 3 cycles when queue depth breaches SLA - [x] Extension upgrade path tested (0.21.0 → 0.22.0) - [x] just check-version-sync passes

PGXN

PostgreSQL Extension Network