Plain-language companion: v0.25.0.md

v0.25.0 — Scheduler Scalability & Pooler Performance

Status: ✅ Released. Sourced from PLAN_OVERALL_ASSESSMENT_2.md §4, §5, §7.

Release Theme This release pushes the comfortable operating point from “hundreds” to thousands of stream tables on commodity hardware. The scheduler stops reloading the full catalog on every tick, the template cache becomes shared across all backends via shmem, change detection is batched, and the DAG rebuild path uses copy-on-write to avoid blocking dispatch. Connection-pooler deployments (PgBouncer, RDS Proxy, Supabase) see the biggest win: the shared L0 cache eliminates the 30–45 ms cold-start tax per backend. The predictive cost model gets robustness guards, and downstream publications gain subscriber-lag tracking.

Catalog & Scheduler Scalability

Item Description Effort Ref
SCAL-1 Shmem catalog snapshot cache. Cache pgt_stream_tables rows in shared memory, keyed by DAG generation counter. Invalidated on DDL via DAG_REBUILD_SIGNAL. Eliminates per-tick SPI reload (20–200 ms win at 100–1000 STs). 4d PLAN_OVERALL_ASSESSMENT_2.md §4, §5
SCAL-2 Batched change detection. Combine per-source SELECT EXISTS(...) queries into a single UNION ALL CTE per refresh group. ~80% reduction in per-tick change-detection cost. 2d PLAN_OVERALL_ASSESSMENT_2.md §5
SCAL-3 Split PGS_STATE lock. Replace the single PgLwLock in src/shmem.rs with per-concern locks (dag_lock, metrics_lock, worker_pool_lock). Use share() for read-only dag_version reads. 3d PLAN_OVERALL_ASSESSMENT_2.md §5
SCAL-4 Copy-on-write DAG rebuild. Compute the new topological order out-of-line (no exclusive lock), then atomically swap the pointer. Defers full rebuild to idle ticks when possible. 4d PLAN_OVERALL_ASSESSMENT_2.md §4
SCAL-5 Persistent worker pool option. New pg_trickle.worker_pool_size GUC (default 0 = current spawn-per-task). Workers loop on a shmem queue instead of being registered and deregistered each tick (~2 ms/worker saved). 3d PLAN_OVERALL_ASSESSMENT_2.md §5

Template Cache & Pooler Latency

Item Description Effort Ref
CACHE-1 Shared shmem L0 template cache. dshash-based cache in shared memory keyed by (pgt_id, cache_generation). All backends in the same database share one compiled template set. Eliminates 30–45 ms cold-start tax in pooled-connection workloads. 5d PLAN_OVERALL_ASSESSMENT_2.md §4, §7
CACHE-2 L1 LRU eviction. Bound the per-backend thread-local cache with pg_trickle.template_cache_max_entries GUC (default 256). Evict least-recently-used entries. 2d PLAN_OVERALL_ASSESSMENT_2.md §4
CACHE-3 pgtrickle.clear_caches() SQL function. Manual cache flush for all levels (L0 shmem + L1 thread-local + L2 catalog). Useful during debugging and emergency migration. 0.5d PLAN_OVERALL_ASSESSMENT_2.md §4

Hot-Path Allocation Reduction

Item Description Effort Ref
PERF-1 xxh3 streaming hash. Replace pg_trickle_hash_multi string-concat + scalar xxhash with xxh3 streaming API (update/finalize). Eliminates per-row String allocation on the CDC hot path. 3d PLAN_OVERALL_ASSESSMENT_2.md §5
PERF-2 Pre-sized SQL buffer in project operator. Replace per-column format! calls in src/dvm/operators/project.rs with a single pre-sized String and write! macro. 1d PLAN_OVERALL_ASSESSMENT_2.md §5
PERF-3 Shmem adaptive cost-model state. Cache last_full_ms/last_diff_ms per ST in shared memory with atomic updates. Prevents parallel workers from reading stale timing data via SPI. 2d PLAN_OVERALL_ASSESSMENT_2.md §5

Predictive Model & Publication Durability

Item Description Effort Ref
PRED-1 Robustness guards on predictive cost model. Clamp predictions to [0.5×, 4×] last_full_ms; use median+MAD instead of mean+SD; require non-degenerate variance; ignore predictions during first 60 s after CREATE. 3d PLAN_OVERALL_ASSESSMENT_2.md §4
PUB-1 Subscriber-LSN tracking for downstream publications. Track subscriber LSN per publication; refuse to TRUNCATE change buffer until all subscribers have acknowledged past the buffer’s max LSN; emit WARNING when a subscriber lags more than pg_trickle.publication_lag_warn_lsn. 4d PLAN_OVERALL_ASSESSMENT_2.md §4
PUB-2 Multi-DB worker fairness. Add pgtrickle.worker_allocation_status() monitoring view (per-DB used/quota/queued). Document recommended quota allocation in docs/SCALING.md. 2d PLAN_OVERALL_ASSESSMENT_2.md §4

Implementation Phases

Phase Description Duration
Phase 1 Catalog & scheduler scalability: shmem cache, batched detection, lock split Days 1–13
Phase 2 Template cache: L0 dshash, L1 LRU, clear_caches() Days 13–21
Phase 3 Hot-path: xxh3 hash, project buffer, shmem cost-model Days 21–27
Phase 4 Predictive model guards + publication durability + worker fairness Days 27–36
Phase 5 Benchmarks, documentation, upgrade script, integration testing Days 36–42

v0.25.0 total: ~8–9 weeks (~42 person-days solo)

Exit criteria: - [x] SCAL-1: Scheduler tick at 1000 STs completes in < 20 ms (down from ~200 ms) - [x] SCAL-2: Change detection for 10-source ST issues 1 query instead of 10 - [x] SCAL-3: PGS_STATE replaced by 3 per-concern locks; read-only paths use share() - [x] SCAL-4: DAG rebuild does not hold exclusive lock during computation; swap is atomic - [x] SCAL-5: worker_pool_size = 4 starts persistent workers; spawn cost eliminated - [x] CACHE-1: Second backend connecting to same DB hits L0 cache; no parse/differentiate cost - [x] CACHE-2: L1 cache respects template_cache_max_entries; evicts LRU on overflow - [x] CACHE-3: pgtrickle.clear_caches() flushes all three levels; next refresh re-populates - [x] PERF-1: pg_trickle_hash_multi allocates zero intermediate Strings per row - [x] PERF-2: Project operator uses single pre-sized buffer; 50-column ST shows measurable improvement - [x] PERF-3: Parallel workers read cost-model state from shmem, not SPI - [x] PRED-1: Sawtooth workload test: model recovers within 5 samples after outlier spike - [x] PUB-1: Publication with lagged subscriber emits WARNING; change buffer not truncated until ack - [x] PUB-2: worker_allocation_status() returns per-DB used/quota/queued - [x] Benchmark regression gate passes (no regressions vs v0.24.0 baseline) - [x] Extension upgrade path tested (0.24.0 → 0.25.0) - [x] just check-version-sync passes