v0.21.0 — Correctness, Safety, and Test Hardening
Full technical details: v0.21.0.md-full.md
Status: ✅ Released | Scope: Large (~6–8 weeks)
Closes the last known data-correctness gap in join delta computation, enforces a zero-crash guarantee across the codebase, expands unit test coverage in previously untested modules, and introduces a shadow/canary mode for safely testing query changes before going live.
What problem does this solve?
Systematic analysis of the codebase after v0.20.0 identified several
categories of risk: a remaining join correctness bug (EC-01 phantom rows
in multi-table LEFT/RIGHT JOINs), .unwrap() calls that could crash the
PostgreSQL backend, modules with low test coverage, and the risk of
disruption when modifying a production stream table’s query.
EC-01 JOIN Delta Phantom Row Fix
The EC-01 bug — phantom rows appearing in stream tables after specific sequences of DELETE and INSERT on multi-table JOINs — was first identified in v0.12.0 and partially addressed across several releases. v0.21.0 delivers the complete fix:
- The row identity hash for Part 1b of the join delta algorithm (the “right-side unchanged” portion) is now computed correctly, ensuring both halves of the join delta emit the same row identifier and cancel each other out properly
- Prior-cycle phantom rows are cleaned up by the refresh process
- TPC-H Q07 (which exercises this pattern) is validated to pass deterministically across 5,000 randomised property test iterations
In plain terms: if your stream table computes a LEFT JOIN and rows are deleted and re-inserted on either side, the results are now always correct — never showing rows that should not be there.
Zero .unwrap() in Production Code
A clippy::unwrap_used lint rule was added that fails the build if any
.unwrap() call appears in non-test code. Every .unwrap() in the production
code path was converted to proper error handling that returns a descriptive
error to the caller instead of crashing the backend.
The unsafe code surface was reduced by 40% through the introduction of safe wrapper functions around PostgreSQL C internals.
Non-Deterministic Function Rejection
At stream table creation time, pg_trickle now rejects (or warns about)
queries that use non-deterministic functions like now(), random(), or
volatile user-defined functions without an explicit non_deterministic => true
acknowledgement. This prevents a whole class of subtle drift bugs before they
reach production.
Test Coverage Campaign
Three large, previously under-tested modules received comprehensive unit test coverage:
api/helpers.rs(25+ new tests) — query validation, schema helpers, CDC orchestration utilitiesapi/diagnostics.rs(15+ new tests) —explain_st,health_summary,cache_statsformattingdvm/parser/rewrites.rs(30+ new tests) — all seven SQL rewrite passes
A parser fuzz target was added that runs random SQL through the pg_trickle query parser for an hour without panics — any panic would indicate a code path that could crash the backend on unexpected input.
A crash-recovery test kills the background worker mid-refresh and verifies that the database is left in a consistent state: no partially-applied refreshes, WAL decoder resumed from the correct position.
Shadow / Canary Mode for Safer Query Changes
When you need to change a production stream table’s query, doing so directly is risky — the new query might produce different results or refresh more slowly.
alter_stream_table(name, dry_run_shadow => true) creates a shadow copy
of the stream table (pgt_shadow_<name>) that runs the new query on the
same schedule as the live table. Operators can compare the two versions with
pgtrickle.canary_diff(name) before committing the change. When satisfied,
pgtrickle.canary_promote(name) atomically swaps the shadow into production.
In plain terms: test your query change on real production data alongside the live version, verify the results match your expectations, then flip the switch.
New Operational Helpers
pgtrickle.pause_all()/resume_all()— suspend all stream tables at once during maintenancepgtrickle.refresh_if_stale(name, max_age)— only trigger a refresh if the stream table is older than a specified agepgtrickle.stream_table_definition(name)— returns the full definition of a stream table for auditing
Prometheus HTTP Endpoint
The background worker now serves Prometheus metrics directly over HTTP
(port configurable via pg_trickle.metrics_port), removing the need for
a separate Prometheus exporter process.
Performance Tuning Cookbook
A new docs/PERFORMANCE_COOKBOOK.md document consolidates all performance
tuning advice — previously scattered across FAQ, TROUBLESHOOTING, and
SCALING — into a single reference: symptom → likely cause → configuration
to adjust → how to measure improvement.
Scope
v0.21.0 is a comprehensive quality release. The EC-01 fix closes the last known data-correctness gap. The zero-unwrap guarantee eliminates a class of potential backend crashes. The test coverage campaign and fuzz target significantly raise the floor on correctness confidence. Shadow/canary mode makes production query changes safer.