v0.25.0.md-full / PostgreSQL Extension Network

- v0.25.0 — Scheduler Scalability & Pooler Performance

Plain-language companion: v0.25.0.md

v0.25.0 — Scheduler Scalability & Pooler Performance

Status: ✅ Released. Sourced from PLAN_OVERALL_ASSESSMENT_2.md §4, §5, §7.

Release Theme This release pushes the comfortable operating point from “hundreds” to thousands of stream tables on commodity hardware. The scheduler stops reloading the full catalog on every tick, the template cache becomes shared across all backends via shmem, change detection is batched, and the DAG rebuild path uses copy-on-write to avoid blocking dispatch. Connection-pooler deployments (PgBouncer, RDS Proxy, Supabase) see the biggest win: the shared L0 cache eliminates the 30–45 ms cold-start tax per backend. The predictive cost model gets robustness guards, and downstream publications gain subscriber-lag tracking.

Catalog & Scheduler Scalability

Item	Description	Effort	Ref
SCAL-1	Shmem catalog snapshot cache. Cache `pgt_stream_tables` rows in shared memory, keyed by DAG generation counter. Invalidated on DDL via `DAG_REBUILD_SIGNAL`. Eliminates per-tick SPI reload (20–200 ms win at 100–1000 STs).	4d	PLAN_OVERALL_ASSESSMENT_2.md §4, §5
SCAL-2	Batched change detection. Combine per-source `SELECT EXISTS(...)` queries into a single `UNION ALL` CTE per refresh group. ~80% reduction in per-tick change-detection cost.	2d	PLAN_OVERALL_ASSESSMENT_2.md §5
SCAL-3	Split PGS_STATE lock. Replace the single `PgLwLock` in `src/shmem.rs` with per-concern locks (`dag_lock`, `metrics_lock`, `worker_pool_lock`). Use `share()` for read-only `dag_version` reads.	3d	PLAN_OVERALL_ASSESSMENT_2.md §5
SCAL-4	Copy-on-write DAG rebuild. Compute the new topological order out-of-line (no exclusive lock), then atomically swap the pointer. Defers full rebuild to idle ticks when possible.	4d	PLAN_OVERALL_ASSESSMENT_2.md §4
SCAL-5	Persistent worker pool option. New `pg_trickle.worker_pool_size` GUC (default 0 = current spawn-per-task). Workers loop on a shmem queue instead of being registered and deregistered each tick (~2 ms/worker saved).	3d	PLAN_OVERALL_ASSESSMENT_2.md §5

Template Cache & Pooler Latency

Item	Description	Effort	Ref
CACHE-1	Shared shmem L0 template cache. `dshash`-based cache in shared memory keyed by `(pgt_id, cache_generation)`. All backends in the same database share one compiled template set. Eliminates 30–45 ms cold-start tax in pooled-connection workloads.	5d	PLAN_OVERALL_ASSESSMENT_2.md §4, §7
CACHE-2	L1 LRU eviction. Bound the per-backend thread-local cache with `pg_trickle.template_cache_max_entries` GUC (default 256). Evict least-recently-used entries.	2d	PLAN_OVERALL_ASSESSMENT_2.md §4
CACHE-3	`pgtrickle.clear_caches()` SQL function. Manual cache flush for all levels (L0 shmem + L1 thread-local + L2 catalog). Useful during debugging and emergency migration.	0.5d	PLAN_OVERALL_ASSESSMENT_2.md §4

Hot-Path Allocation Reduction

Item	Description	Effort	Ref
PERF-1	xxh3 streaming hash. Replace `pg_trickle_hash_multi` string-concat + scalar xxhash with `xxh3` streaming API (`update`/`finalize`). Eliminates per-row `String` allocation on the CDC hot path.	3d	PLAN_OVERALL_ASSESSMENT_2.md §5
PERF-2	Pre-sized SQL buffer in project operator. Replace per-column `format!` calls in `src/dvm/operators/project.rs` with a single pre-sized `String` and `write!` macro.	1d	PLAN_OVERALL_ASSESSMENT_2.md §5
PERF-3	Shmem adaptive cost-model state. Cache `last_full_ms`/`last_diff_ms` per ST in shared memory with atomic updates. Prevents parallel workers from reading stale timing data via SPI.	2d	PLAN_OVERALL_ASSESSMENT_2.md §5

Predictive Model & Publication Durability

Item	Description	Effort	Ref
PRED-1	Robustness guards on predictive cost model. Clamp predictions to `[0.5×, 4×] last_full_ms`; use median+MAD instead of mean+SD; require non-degenerate variance; ignore predictions during first 60 s after CREATE.	3d	PLAN_OVERALL_ASSESSMENT_2.md §4
PUB-1	Subscriber-LSN tracking for downstream publications. Track subscriber LSN per publication; refuse to TRUNCATE change buffer until all subscribers have acknowledged past the buffer’s max LSN; emit WARNING when a subscriber lags more than `pg_trickle.publication_lag_warn_lsn`.	4d	PLAN_OVERALL_ASSESSMENT_2.md §4
PUB-2	Multi-DB worker fairness. Add `pgtrickle.worker_allocation_status()` monitoring view (per-DB used/quota/queued). Document recommended quota allocation in `docs/SCALING.md`.	2d	PLAN_OVERALL_ASSESSMENT_2.md §4

Implementation Phases

Phase	Description	Duration
Phase 1	Catalog & scheduler scalability: shmem cache, batched detection, lock split	Days 1–13
Phase 2	Template cache: L0 dshash, L1 LRU, clear_caches()	Days 13–21
Phase 3	Hot-path: xxh3 hash, project buffer, shmem cost-model	Days 21–27
Phase 4	Predictive model guards + publication durability + worker fairness	Days 27–36
Phase 5	Benchmarks, documentation, upgrade script, integration testing	Days 36–42

v0.25.0 total: ~8–9 weeks (~42 person-days solo)

Exit criteria: - [x] SCAL-1: Scheduler tick at 1000 STs completes in < 20 ms (down from ~200 ms) - [x] SCAL-2: Change detection for 10-source ST issues 1 query instead of 10 - [x] SCAL-3: PGS_STATE replaced by 3 per-concern locks; read-only paths use share() - [x] SCAL-4: DAG rebuild does not hold exclusive lock during computation; swap is atomic - [x] SCAL-5: worker_pool_size = 4 starts persistent workers; spawn cost eliminated - [x] CACHE-1: Second backend connecting to same DB hits L0 cache; no parse/differentiate cost - [x] CACHE-2: L1 cache respects template_cache_max_entries; evicts LRU on overflow - [x] CACHE-3: pgtrickle.clear_caches() flushes all three levels; next refresh re-populates - [x] PERF-1: pg_trickle_hash_multi allocates zero intermediate Strings per row - [x] PERF-2: Project operator uses single pre-sized buffer; 50-column ST shows measurable improvement - [x] PERF-3: Parallel workers read cost-model state from shmem, not SPI - [x] PRED-1: Sawtooth workload test: model recovers within 5 samples after outlier spike - [x] PUB-1: Publication with lagged subscriber emits WARNING; change buffer not truncated until ack - [x] PUB-2: worker_allocation_status() returns per-DB used/quota/queued - [x] Benchmark regression gate passes (no regressions vs v0.24.0 baseline) - [x] Extension upgrade path tested (0.24.0 → 0.25.0) - [x] just check-version-sync passes

PGXN

PostgreSQL Extension Network