Contents
Blue-Green Deployment for pg_trickle
Type: REPORT · Status: Exploration · Date: 2026-03-03
Problem Statement
Propagating changes through a large DAG of stream tables can take significant
time — especially initial FULL refreshes on wide or high-cardinality tables.
When users need to modify a pipeline (new query logic, schema changes, operator
upgrades, schedule tuning), the existing pipeline must keep serving
stale-but-consistent data while the replacement pipeline catches up in the
background. Once the new pipeline is current, a hot swap redirects consumers
transparently.
This is analogous to blue-green deployment in application servers, applied to incremental view maintenance pipelines.
Use Cases
- Query logic change — rewrite a defining query (e.g., add a new join or aggregation) without downtime.
- Schema evolution — upstream
ALTER TABLErequires rebuilding downstream STs; keep the old pipeline serving while the new one reinitializes. - Refresh mode migration — switch from
FULLtoDIFFERENTIAL(or vice versa) with a safety net. - DAG restructuring — split, merge, or reorder stream tables in the dependency graph.
- Extension upgrade — new pg_trickle version with different DVM operators needs pipeline rebuild.
Design Goals
| Goal | Priority |
|---|---|
| Zero (or near-zero) read downtime during swap | Must-have |
| Works for individual STs and whole pipelines | Must-have |
| Automatic convergence detection + manual override | Must-have |
| Rollback capability | Should-have |
| No breaking changes to existing catalog schema | Should-have |
| Minimal scheduler overhead during dual-pipeline phase | Nice-to-have |
Current Architecture — What Already Supports Blue-Green
A thorough analysis of the existing internals reveals that pg_trickle is already largely blue-green friendly at the infrastructure level.
CDC Sharing (Ready)
Triggers and change buffer tables are keyed by source OID, not by stream
table ID. The pgtrickle.pgt_change_tracking.tracked_by_pgt_ids array
supports multiple consumers. Both the blue (existing) and green (new) STs can
consume from the same change buffers simultaneously with independent frontiers.
- Trigger naming:
pg_trickle_cdc_{source_oid}— one trigger per source regardless of how many STs depend on it. - Buffer table:
pgtrickle_changes.changes_{source_oid}— shared. - Deferred cleanup uses
MIN(frontier)across all consumers, so green’s presence prevents premature deletion of rows it hasn’t consumed yet.
Independent Frontiers (Ready)
Each ST has its own frontier JSONB column in pgt_stream_tables tracking
per-source LSN watermarks. A green ST with a different pgt_id maintains
completely independent frontier state.
Internal Key Isolation (Ready)
- MERGE template cache — keyed by
pgt_id(thread-localHashMap). - Prepared statements — per-session, keyed by
pgt_id. - Advisory locks — keyed by
pgt_id. - Retry/backoff state — keyed by
pgt_id.
Two versions of the same logical ST with different pgt_id values will not
collide in any internal data structure.
DAG Rebuild (Ready)
signal_dag_rebuild() is a cheap atomic increment. The scheduler rebuilds the
full DAG from catalog tables on the next cycle, so adding/removing STs is a
live operation with no restart required.
Status Lifecycle (Ready)
The INITIALIZING → ACTIVE → SUSPENDED → ERROR state machine exists. A green
ST can be created in INITIALIZING state, manually refreshed to catch up, then
flipped to ACTIVE for the scheduler to pick up.
Core Challenge: Identity & Naming
The central tension is the UNIQUE constraint on
(pgt_schema, pgt_name) in pgt_stream_tables and the UNIQUE on
pgt_relid (storage table OID). Two STs cannot share the same qualified name
or point to the same physical table. The swap mechanism must navigate this.
Approaches
Approach A: Shadow Tables with Atomic Rename
Concept: Create the green ST with a shadow name (e.g.,
order_totals__pgt_green), let it catch up, then do an atomic three-way rename
in a single transaction.
Lifecycle:
1. User calls: pgtrickle.create_green('order_totals', query => '...', ...)
2. System creates ST: order_totals__pgt_green (status: INITIALIZING)
3. System runs FULL refresh on green ST
4. Green ST flips to ACTIVE — scheduler runs differentials alongside blue
5. Convergence detected (or user calls promote)
6. In one transaction:
ALTER TABLE order_totals RENAME TO order_totals__pgt_retiring;
ALTER TABLE order_totals__pgt_green RENAME TO order_totals;
UPDATE pgtrickle.pgt_stream_tables ... -- fix names, relids
DROP the retired ST (or keep for rollback)
7. Signal DAG rebuild
Pros:
- Simple mental model — two STs, one rename.
- Consumers querying SELECT * FROM order_totals see no change.
- Minimal catalog schema changes (only need a green_of BIGINT FK column).
- Works for both per-ST and whole-pipeline swaps.
Cons:
- ALTER TABLE ... RENAME takes AccessExclusiveLock — brief (sub-ms
catalog-only operation) but blocks concurrent queries momentarily.
- For whole-pipeline swap of N STs, N renames in one transaction — lock
ordering must be deterministic to avoid deadlocks.
- Downstream STs referencing the blue ST by OID in their defining_query need
query text rewriting at promote time.
- Not truly zero-downtime: there is a lock window (typically < 1ms per table).
Complexity: Low–Medium.
Approach B: Views as Public Interface
Concept: The public-facing name is always a view. The physical storage
table has a versioned name. Swap = CREATE OR REPLACE VIEW pointing to the new
storage table.
Lifecycle:
1. On initial ST creation:
Storage table: order_totals__pgt_v1
Public view: CREATE VIEW order_totals AS SELECT * FROM order_totals__pgt_v1;
2. Green version: order_totals__pgt_v2 (new ST, separate pgt_id)
3. Once caught up: CREATE OR REPLACE VIEW order_totals AS SELECT * FROM order_totals__pgt_v2;
4. Drop v1 ST (or retain for rollback)
Pros:
- CREATE OR REPLACE VIEW is lightweight, faster than rename.
- Clean separation: logical name (view) vs physical storage (versioned table).
- Rollback = point the view back.
- View replacement doesn’t invalidate open cursors on the old version.
Cons:
- Breaking change for existing deployments — all STs would now live behind
views. This changes behavior for INSERT (views need INSTEAD OF triggers
or are read-only), \d output, pg_class catalogue lookups, index
visibility from pg_indexes, etc.
- Adds an indirection layer to every read query (minor but measurable on
micro-benchmarks).
- Downstream ST defining_query that references the view gets rewritten via
the existing view-inlining code in api.rs — the inliner resolves views to
their base query, so it would see the underlying versioned table. This
creates a coupling: green STs depending on another green ST would need their
view pointer updated first (ordering constraint).
- More complex catalog model: need to track both view OID and storage table
OID per entry.
Complexity: Medium–High. The breaking-change aspect makes this unsuitable as a retrofit unless pg_trickle adopts views-as-public-interface from the start (e.g., as a major version change).
Approach C: Pipeline Generations (First-Class Concept)
Concept: Introduce a “pipeline” entity with a generation counter. STs belong to a pipeline + generation. Only one generation is “active” at a time.
New catalog table:
CREATE TABLE pgtrickle.pgt_pipelines (
pipeline_id BIGSERIAL PRIMARY KEY,
pipeline_name TEXT UNIQUE NOT NULL,
active_gen INT NOT NULL DEFAULT 1,
green_gen INT, -- NULL when no green is being prepared
created_at TIMESTAMPTZ DEFAULT now()
);
Extended pgt_stream_tables:
ALTER TABLE pgtrickle.pgt_stream_tables
ADD COLUMN pipeline_id BIGINT REFERENCES pgtrickle.pgt_pipelines,
ADD COLUMN generation INT;
The UNIQUE constraint on (pgt_schema, pgt_name) would be relaxed to
UNIQUE (pgt_schema, pgt_name, generation).
Lifecycle:
1. pgtrickle.begin_green('my_pipeline')
→ clones all STs in the pipeline with generation + 1
→ new storage tables, new pgt_ids, inherits queries/config
→ green STs start catching up (FULL → DIFFERENTIAL)
2. pgtrickle.promote_green('my_pipeline')
→ atomically swap active_gen, rename tables, update references
→ mark old generation as retired
3. pgtrickle.rollback_green('my_pipeline')
→ drop green-generation STs, reset green_gen to NULL
4. pgtrickle.cleanup_generation('my_pipeline', gen)
→ drop retired storage tables and catalog entries
Pros:
- First-class concept with clear semantics and good observability.
- pgt_status() can show generation info: “green is 95% caught up.”
- Supports both whole-pipeline and per-ST swaps (single-ST “pipeline”).
- Enables rollback without keeping shadow tables.
- Generation info is queryable — useful for monitoring and alerting.
Cons:
- Largest schema change: new table, altered pgt_stream_tables, migration
path for existing deployments, extension upgrade SQL.
- “Pipeline” grouping may not map cleanly to arbitrary DAG subsets — what if
a user wants to green-deploy 3 of 10 STs that share sources?
- The rename/swap mechanics under the hood are the same as Approach A — this
is essentially Approach A with richer metadata.
- More concepts for users to learn.
Complexity: High, but most comprehensive.
Approach D: Schema-Based Isolation
Concept: Blue pipeline lives in the user’s schema. Green pipeline lives in
a staging schema (e.g., pgtrickle_staging). Swap = rename schemas or adjust
search_path.
Lifecycle:
1. pgtrickle.prepare_green(staging_schema => 'pgtrickle_staging')
→ recreate all STs in the staging schema
2. Green STs catch up in isolation
3. ALTER SCHEMA public RENAME TO public_old;
ALTER SCHEMA pgtrickle_staging RENAME TO public;
4. Drop old schema
Pros: - Complete isolation during build-up. - Schema rename is a single catalog operation (fast).
Cons:
- Very coarse-grained — swaps the entire schema, not just STs.
- Source tables in the original schema → cross-schema query references.
- ALTER SCHEMA ... RENAME is AccessExclusiveLock on all contained objects.
- Doesn’t support per-ST granularity at all.
- Application search_path changes required.
Complexity: Medium, but too inflexible for most scenarios.
Comparison Matrix
| Criterion | A: Shadow Rename | B: Views | C: Generations | D: Schema |
|---|---|---|---|---|
| Per-ST granularity | ✅ | ✅ | ✅ | ❌ |
| Whole-pipeline swap | ✅ | ✅ | ✅ | ✅ |
| No breaking changes | ✅ | ❌ | ⚠️ Schema migration | ✅ |
| Rollback support | ⚠️ Manual | ✅ Easy | ✅ Built-in | ⚠️ Manual |
| Lock duration | ~ms | ~0 | ~ms | ~ms × N |
| Implementation effort | Low | Medium-High | High | Medium |
| Observability | Basic | Basic | Rich | Basic |
| Downstream query rewrite | Required | Automatic (via view) | Required | Not needed |
| Catalog schema changes | Minimal | Moderate | Significant | None |
Convergence Detection
For auto-swap, the system needs to determine when the green pipeline has “caught up” to the blue pipeline. Options:
1. Frontier LSN Comparison (Recommended)
Compare green.frontier.sources[oid].lsn ≥ blue.frontier.sources[oid].lsn
for all shared sources. When all sources pass, green is at least as fresh as
blue.
- Pro: Already tracked, no extra queries, precise.
- Con: Doesn’t guarantee data equivalence (different query logic produces different results even at the same LSN).
2. Data Timestamp Comparison
green.data_timestamp ≥ blue.data_timestamp — simpler but coarser.
- Pro: Single scalar comparison.
- Con: Timestamp resolution may not match LSN precision.
3. Lag Threshold
User-defined maximum acceptable lag: “swap when green is within 5 seconds of blue.” Useful for time-sensitive workloads where exact convergence isn’t required.
4. Row Count / Content Hash
Compare COUNT(*) or pg_trickle.content_hash(table) between blue and green
storage tables.
- Pro: Deterministic proof of equivalence.
- Con: Expensive on large tables; only meaningful when queries are identical.
Recommendation: Use frontier LSN comparison (#1) as the primary
mechanism, with an optional lag threshold (#3) for user control. Content
hash (#4) can be offered as a verify flag for cautious deployments.
Recommendation
Approach A (Shadow Tables with Atomic Rename) as the implementation strategy, augmented with lightweight metadata from Approach C for observability.
Rationale
- Approach A is the lowest-risk, lowest-effort path that solves all must-have requirements.
- The internal machinery (CDC sharing, independent frontiers,
pgt_idisolation) already supports dual pipelines — what’s missing is just the orchestration layer. - Adding a
green_of BIGINTFK andpipeline_group TEXTcolumn topgt_stream_tablesgives the observability benefits of Approach C without the full pipeline entity overhead. - Approach B (views) would be a better long-term architecture but is a breaking change that warrants its own migration plan.
- Approach D is too coarse for real-world use.
Proposed API Surface
-- Create a green version of an existing stream table.
-- Inherits query/schedule/mode from blue unless overridden.
SELECT pgtrickle.create_green(
'public.order_totals',
query => 'SELECT ... FROM orders JOIN ...', -- optional override
schedule => '10s', -- optional override
mode => 'DIFFERENTIAL' -- optional override
);
-- Check green progress: frontier lag, row count, status.
SELECT * FROM pgtrickle.green_status('public.order_totals');
-- Returns: green_name, status, frontier_lag_bytes, data_timestamp_lag,
-- blue_row_count, green_row_count, converged (bool)
-- Promote: atomic swap green → active, retire blue.
-- Fails if green hasn't converged (override with force => true).
SELECT pgtrickle.promote_green('public.order_totals');
SELECT pgtrickle.promote_green('public.order_totals', force => true);
-- Promote all green STs that have converged.
SELECT * FROM pgtrickle.promote_all_green();
-- Abort: drop green ST, clean up.
SELECT pgtrickle.cancel_green('public.order_totals');
-- Rollback: swap back to the retired blue (if not yet cleaned up).
SELECT pgtrickle.rollback_green('public.order_totals');
Catalog Changes (Minimal)
ALTER TABLE pgtrickle.pgt_stream_tables
ADD COLUMN green_of BIGINT REFERENCES pgtrickle.pgt_stream_tables(pgt_id),
ADD COLUMN pipeline_group TEXT;
green_of— points to the blue ST’spgt_id.NULLfor normal STs.pipeline_group— optional label for grouping multi-ST swaps.
Promote Transaction (Pseudocode)
BEGIN;
-- 1. Lock both tables deterministically (by OID, ascending) to avoid deadlocks
LOCK TABLE blue_table IN ACCESS EXCLUSIVE MODE;
LOCK TABLE green_table IN ACCESS EXCLUSIVE MODE;
-- 2. Rename
ALTER TABLE blue_table RENAME TO order_totals__pgt_retired;
ALTER TABLE green_table RENAME TO order_totals;
-- 3. Update catalog
UPDATE pgtrickle.pgt_stream_tables
SET pgt_name = 'order_totals__pgt_retired',
status = 'SUSPENDED'
WHERE pgt_id = :blue_id;
UPDATE pgtrickle.pgt_stream_tables
SET pgt_name = 'order_totals',
green_of = NULL
WHERE pgt_id = :green_id;
-- 4. Rewrite defining_query of any downstream STs that reference
-- the old blue OID → new green OID
UPDATE pgtrickle.pgt_dependencies
SET source_relid = :green_oid
WHERE source_relid = :blue_oid;
-- 5. Signal DAG rebuild
SELECT pgtrickle.signal_dag_rebuild();
COMMIT;
The blue ST remains in the catalog as SUSPENDED until explicitly cleaned up,
enabling rollback.
Open Questions
Downstream OID rewrites. When a green ST replaces a blue ST that other STs depend on, the downstream
defining_querytext contains the old OID (or table name). How deep does the rewrite need to go? Just the dependency edges, or the SQL text too? (The SQL text uses names, not OIDs, so renaming the table should suffice — but needs verification.)Consistency groups during transition. If blue and green STs participate in the same diamond consistency group, could the scheduler produce inconsistent results? Likely no — they have independent
pgt_ids and different names, so the consistency group logic treats them as separate entities. But edge cases need analysis.Whole-pipeline ordering during promote. For a pipeline with 5 STs in a DAG, the promote must happen in topological order (upstream first) so that downstream STs immediately see the promoted upstream. Or should the promote be all-at-once in a single transaction?
CDC trigger sharing with modified queries. If the green ST has a different
defining_querythat references additional source tables not in the blue ST’s dependency set, new CDC triggers must be created for those sources. The existingsetup_cdc_for_source()handles this, but the green ST creation flow needs to call it.Storage table column mismatch. If the green ST has a different column set (due to query changes), the rename alone isn’t sufficient — downstream queries expecting the blue schema will break. Should
promote_greenverify schema compatibility, or is this the user’s responsibility?Advisor lock contention. Both blue and green STs use independent advisory locks (keyed by
pgt_id), so no contention. But during the promote transaction, we need to ensure neither ST is mid-refresh. The promote should acquire both advisory locks before proceeding.Auto-promote timing. Should the scheduler perform auto-promote, or should it be a separate background check? Embedding it in the scheduler loop adds complexity; a separate polling function called by cron may be simpler.
Related Work
- PLAN_UPGRADE_MIGRATIONS.md — extension upgrade migrations (overlapping concern for version-to-version transitions).
- PLAN_HYBRID_CDC.md — CDC mode transitions already implement a form of “hot swap” from trigger to WAL.
- PLAN_FUSE.md — anomalous change volume detection could trigger auto-rollback of a green pipeline.
- REPORT_EXTERNAL_PROCESS.md — external sidecar could orchestrate blue-green at a higher level.
- REPORT_DOWNSTREAM_CONSUMERS.md — downstream consumer patterns (change feeds, NOTIFY, logical replication) and how they interact with blue-green swap.
Next Steps
- Validate Approach A with a manual proof-of-concept: create two STs on the same source, let both catch up, then rename in a transaction. Measure lock duration and verify downstream queries.
- Prototype
create_green/promote_greenas SQL functions — minimal viable implementation. - Design the convergence check — implement
green_status()using frontier comparison. - Evaluate Approach B (views) as a longer-term migration path, potentially
gated behind a GUC (
pg_trickle.use_view_aliases = on). - Write an ADR once we commit to an approach.