Downstream Consumer Patterns for Stream Tables

Downstream Consumer Patterns for Stream Tables

Type: REPORT · Status: Exploration · Date: 2026-03-03

Problem Statement

Analysts query stream tables directly via SQL — that’s the simple case. But a growing class of consumers wants to be triggered when a stream table changes: push data to a cache layer, invalidate a CDN, update a search index, notify a microservice, or feed a downstream event pipeline.

Today, pg_trickle’s value proposition stops at the materialized table. Getting changes out of that table to external systems is left to the user. This report explores what exists, what’s missing, and what options make sense.

What Already Exists

1. LISTEN/NOTIFY Alerts

pg_trickle emits PostgreSQL notifications on two channels:

pg_trickle_alert — operational events (JSON payloads):

Event	Payload Fields	When
`refresh_completed`	`stream_table`, `action`, `rows_inserted`, `rows_deleted`, `duration_ms`	After every successful refresh
`refresh_failed`	`stream_table`, `action`, `error`	After a failed refresh
`stale_data`	`stream_table`, `staleness_seconds`	Staleness exceeds 2× schedule
`auto_suspended`	`stream_table`, `consecutive_errors`	ST auto-suspended
`resumed`	`stream_table`	ST resumed
`reinitialize_needed`	`stream_table`	Upstream DDL detected

pgtrickle_refresh — emitted after FULL refresh when user triggers are present (contains stream_table, schema, mode, rows).

Verdict: Good for “something changed” signals. No row-level detail — the consumer knows that data changed and aggregate counts, but not what changed.

2. User Triggers on Stream Tables (Implemented)

When a DIFFERENTIAL-mode ST has user-defined row-level triggers, the refresh engine switches from MERGE to explicit DELETE → UPDATE → INSERT statements. This gives correct TG_OP, OLD, and NEW values.

Controlled by GUC: pg_trickle.user_triggers = auto | on | off
Works for DIFFERENTIAL mode only
FULL refresh suppresses user triggers (DISABLE TRIGGER USER) and emits a NOTIFY pgtrickle_refresh instead
Performance overhead: ~25–60% vs MERGE for triggered STs; zero for non-triggered STs

Verdict: The most powerful existing mechanism. Users can write an AFTER INSERT/UPDATE/DELETE trigger that does anything: insert into an audit table, call pg_notify() with row data, invoke pg_net for HTTP, etc.

3. Logical Replication from Stream Tables

Stream tables are standard heap tables. Users can set up logical replication:

CREATE PUBLICATION my_pub FOR TABLE public.order_totals;
-- Then: CREATE SUBSCRIPTION ... on the subscriber

Debezium, pgoutput, wal2json all work — stream tables are just tables.

Verdict: Works out of the box for systems that already consume PG logical replication. No pg_trickle-specific setup needed. But adds operational complexity (replication slots, WAL retention) and doesn’t work for non-PG consumers without a connector.

Consumer Personas & Needs

Persona	Needs	Latency	Volume	Best Existing Fit
Analyst / BI tool	Query latest data	Minutes–hours	Bulk reads	Direct SQL — solved
Dashboard / cache	Know when to refresh	Seconds	Signal only	`LISTEN pg_trickle_alert`
Microservice	Get changed rows	Seconds	Row-level	User triggers → `pg_notify()`
Search indexer	Get inserts/updates/deletes	Seconds–minutes	Row-level	User triggers → audit table
Event pipeline (Kafka)	Structured change feed	Sub-second–seconds	Row-level	Logical replication or user trigger
API / webhook	HTTP push on change	Seconds	Row-level	User trigger → `pg_net`
Audit / compliance	Complete change history	N/A	Row-level + temporal	No good fit today

Gaps

Gap 1: No Row-Level Change Details in NOTIFY

refresh_completed tells you “42 rows inserted, 3 deleted in order_totals” but not which rows. A consumer that only cares about orders above $10,000 must re-query the entire table to find out.

User triggers are suppressed during FULL refresh for correctness (TRUNCATE + INSERT would fire INSERT triggers for every row, creating a false “all rows are new” signal). The only notification is a bulk NOTIFY. Consumers must treat FULL refresh as a complete invalidation — “re-read everything.”

Gap 3: No Built-In Change Audit Table for STs

The CDC layer maintains pgtrickle_changes.changes_{source_oid} for source tables. There is no equivalent for stream tables. A consumer wanting “what changed in the ST in the last 5 minutes” has no structured feed to query.

Gap 4: No Transactional Outbox

For event-driven architectures, the transactional outbox pattern guarantees that events are produced exactly when data changes commit. Today, user triggers run inside the refresh transaction (good for atomicity), but there’s no pg_trickle-managed outbox table with guaranteed delivery semantics.

Gap 5: No External Push Integration

No webhooks, no Kafka producer, no NATS publisher. All external push requires user-written triggers or an external connector (Debezium on logical replication). This is arguably the right boundary for a PostgreSQL extension — but it’s a gap that users will hit.

Approaches

Approach A: Enhanced NOTIFY with Change Summary

Extend the existing refresh_completed notification to include actionable metadata: affected primary key values or a change summary.

{
  "event": "refresh_completed",
  "stream_table": "public.order_totals",
  "action": "DIFFERENTIAL",
  "changes": {
    "inserted": 42,
    "updated": 7,
    "deleted": 3,
    "affected_pks": [101, 205, 308, "..."],
    "truncated": true
  },
  "data_timestamp": "2026-03-03T10:15:00Z",
  "duration_ms": 230
}

Pros: Zero infrastructure — pure PG NOTIFY. Consumers get enough detail to do targeted re-queries. Simple to implement (collect PKs during refresh).

Cons: NOTIFY payload is limited to 8000 bytes. For large changesets, the PK list must be truncated (with a truncated: true flag). Not suitable for row-level detail.

Effort: Low.

Approach B: Output Change Buffer Tables

Mirror the input CDC pattern: maintain a pgtrickle_changes.output_{st_oid} table that captures the delta applied to each ST during refresh.

CREATE TABLE pgtrickle_changes.output_12345 (
    change_id   BIGSERIAL,
    refresh_id  BIGINT,     -- FK to pgt_refresh_history
    action      CHAR(1),    -- I/U/D
    pk_hash     BIGINT,
    row_data    JSONB,      -- or typed columns
    created_at  TIMESTAMPTZ DEFAULT now()
);

Populated automatically during each DIFFERENTIAL refresh: - INSERT rows → action = 'I', row_data = NEW - UPDATE rows → action = 'U', row_data = NEW (with old_data for the previous version) - DELETE rows → action = 'D', row_data = OLD - FULL refresh → special action = 'F' (full invalidation marker)

Consumers poll the output buffer:

SELECT * FROM pgtrickle_changes.output_12345
 WHERE change_id > :last_seen_id
 ORDER BY change_id;

Pros: - Structured, queryable change feed — works for any consumer. - Survives connection drops (unlike NOTIFY, which is ephemeral). - Supports replay and rewind. - Can be combined with logical replication on the output buffer itself for streaming to Kafka/Debezium. - Natural fit with the existing pgtrickle_changes schema. - FULL refresh can log a single “full invalidation” row rather than re-logging every row.

Cons: - Storage overhead — every ST change is written twice (to the ST and to the output buffer). - Cleanup responsibility — who deletes consumed output rows? Need a consumer tracking mechanism (similar to tracked_by_pgt_ids for input CDC). - Adds write amplification to every refresh. - JSONB serialization cost for row_data (or complexity of typed columns).

Effort: Medium.

Approach C: Change Feed via AFTER Triggers (Document the Pattern)

Rather than building output CDC into the engine, document and optimize the user-trigger pattern as the official recommendation. Provide helper functions and examples.

-- pg_trickle provides a helper to create an output change table
SELECT pgtrickle.create_change_feed('public.order_totals');
-- Creates: pgtrickle_changes.feed_order_totals (change_id, action, row_data, ...)
-- Installs: AFTER INSERT/UPDATE/DELETE trigger on public.order_totals

-- Consumers read:
SELECT * FROM pgtrickle_changes.feed_order_totals
 WHERE change_id > :cursor
 ORDER BY change_id;

-- Cleanup:
SELECT pgtrickle.cleanup_change_feed('public.order_totals', older_than => interval '1 hour');

Pros: - Builds on existing, working user-trigger infrastructure. - Opt-in per ST — no overhead for STs that don’t need it. - Helper functions reduce boilerplate. - Clear separation: pg_trickle maintains the ST, the feed is a user-space concern managed by pg_trickle-provided utilities. - Works today (user triggers are implemented).

Cons: - Still has the FULL refresh blind spot (triggers suppressed). - ~25–60% overhead for triggered STs. - Not truly “built-in” — it’s a convenience wrapper around existing PG mechanisms. - Consumer tracking / cleanup is still manual (or semi-automated).

Effort: Low–Medium.

Approach D: pg_trickle as Logical Replication Publisher

Use PostgreSQL’s logical replication protocol to publish changes from stream tables. Create a publication automatically for each ST.

-- Automatic on creation:
CREATE PUBLICATION pgtrickle_order_totals FOR TABLE public.order_totals;

-- Consumers use standard PG subscriptions, Debezium, etc.

pg_trickle would manage the publication lifecycle (create on ST creation, drop on ST drop) and document the pattern.

Pros: - Standard PostgreSQL protocol — works with all PG-compatible consumers (Debezium, Kafka Connect, pglogical, etc.). - No custom storage or triggers needed. - Leverages PG’s built-in WAL infrastructure. - Familiar to users who already know logical replication.

Cons: - Requires wal_level = logical — not all deployments have this. - FULL refresh does TRUNCATE + INSERT, which generates WAL for every row (potentially huge WAL spikes). - Replication slots consume WAL until all subscribers consume — risk of WAL bloat if a subscriber is slow. - pg_trickle doesn’t control the consumer side — debugging delivery issues is outside the extension’s scope. - Publication management adds complexity to create/drop lifecycle.

Effort: Low (publication management) to Medium (documentation + guardrails).

Approach E: Outbox Table with Delivery Guarantees

Implement a transactional outbox pattern: changes written to an outbox table in the same transaction as the ST refresh, then a delivery worker processes the outbox and pushes to configured destinations.

This is the most complete solution but essentially turns pg_trickle into a small event broker — likely out of scope for a PostgreSQL extension.

Effort: High. Likely post-1.0 / external sidecar territory.

Comparison Matrix

Criterion	A: NOTIFY++	B: Output Buffer	C: Feed Helper	D: Pub/Sub	E: Outbox
Row-level detail	❌ (PKs only)	✅	✅	✅	✅
Survives disconnect	❌	✅	✅	✅	✅
Works with FULL refresh	⚠️ Signal only	✅ Marker row	❌ Suppressed	✅ (WAL)	✅
No extra storage	✅	❌	❌	✅ (WAL)	❌
Opt-in per ST	✅	✅	✅	✅	✅
No `wal_level = logical`	✅	✅	✅	❌	✅
Standard PG consumers	❌ (NOTIFY-only)	❌ (custom)	❌ (custom)	✅	❌ (custom)
Implementation effort	Low	Medium	Low–Med	Low–Med	High
Overhead when unused	Zero	Zero	Zero	Low (pub exists)	Low

Recommendation

A layered approach that meets consumers where they are:

Layer 1: Enhanced NOTIFY (Approach A) — Do First

Extend refresh_completed with affected PK list (truncated at 8000 bytes). Zero-cost for consumers who don’t listen. Gives dashboard/cache consumers enough to do targeted invalidation. Minimal implementation effort.

Layer 2: Change Feed Helper (Approach C) — Do Second

Provide pgtrickle.create_change_feed() / cleanup_change_feed() utility functions that install an AFTER trigger + output table on a stream table. Opt-in, per-ST, builds on existing user-trigger infrastructure. Solves the microservice / search indexer persona.

For the FULL refresh blind spot: when a FULL refresh completes and a change feed exists, insert a single action = 'F' (full-invalidation) marker row. Consumers treat this as “re-read everything” — same semantics as the existing NOTIFY but now in the durable feed table.

Layer 3: Logical Replication Docs (Approach D) — Document

Don’t build custom pub/sub machinery. Document how to set up PG logical replication on stream tables for Kafka/Debezium consumers. Provide a pgtrickle.create_publication('order_totals') convenience function that creates the publication and handles lifecycle (drop on ST drop).

Layer 4: Output Change Buffer (Approach B) — Evaluate Later

If the change feed helper (Layer 2) proves insufficient — e.g., users need the delta to include old/new row values without user-trigger overhead, or need FULL refresh change tracking — then build output change buffers into the refresh engine itself. This is the “proper” solution but has higher write amplification cost and should be justified by real demand.

Integration with Blue-Green Deployment

The blue-green report (REPORT_BLUE_GREEN_DEPLOYMENT.md) introduces pipeline hot-swapping. How does that interact with downstream consumers?

Scenario	Impact
LISTEN consumer	Receives `refresh_completed` for both blue and green STs during transition. After promote, receives events for the promoted ST only. Consumer must filter by `stream_table` name.
User trigger / change feed	Trigger is on the storage table. After promote (rename), the trigger moves with the table. The feed table continues to receive changes from the now-promoted green ST.
Logical replication	Publication is on the table name. After `ALTER TABLE RENAME`, PG updates the publication automatically. Subscribers see the renamed table.
Output change buffer	If implemented, buffer is per-ST-OID. After promote, the green ST’s buffer becomes the active feed. The blue ST’s buffer is frozen on cleanup.

Key design rule: change feed identity should follow the logical name, not the physical pgt_id. When green is promoted and takes over the name order_totals, the change feed should seamlessly continue under that name.

Open Questions

NOTIFY payload size. PG’s 8000-byte limit constrains PK lists. Should we use a summary format (bloom filter, hash ranges) instead of literal PKs?
Change feed cleanup policy. Who decides when output rows are deleted? Options: time-based (configurable retention), cursor-based (consumer reports progress), hybrid.
Change feed during FULL refresh. A single F marker works but loses row-level granularity. Should FULL refresh optionally populate the feed with all rows (expensive but complete)?
Trigger overhead threshold. At what point does the 25–60% overhead of explicit DML (for user triggers) become a problem? Should the engine offer a “parallel output” mode that writes to the feed table MERGE-style alongside the main MERGE?
Relationship to the transactional IVM plan. IMMEDIATE mode (v0.2.0 roadmap) would update STs within the source transaction. Would change feeds fire within that same transaction? If so, the outbox pattern (E) becomes more natural.
Multi-consumer feed. Should a single change feed table support multiple independent consumers with separate cursors? Or should each consumer get its own feed (more isolation, more storage)?

REPORT_BLUE_GREEN_DEPLOYMENT.md — pipeline hot-swapping (parallel concern).
PLAN_USER_TRIGGERS_EXPLICIT_DML.md — existing user trigger support (foundation for Layer 2).
PLAN_TRANSACTIONAL_IVM.md — immediate-mode IVM changes the latency profile for consumers.
PLAN_ECO_SYSTEM.md — planned integrations (Airflow, Prometheus, CLI).
REPORT_EXTERNAL_PROCESS.md — sidecar HTTP API could host webhook/SSE endpoints (post-1.0).
PLAN_HYBRID_CDC.md — CDC mode transitions (input-side analogue of output CDC lifecycle).

PGXN

PostgreSQL Extension Network

Contents

Downstream Consumer Patterns for Stream Tables

Problem Statement

What Already Exists

1. LISTEN/NOTIFY Alerts

2. User Triggers on Stream Tables (Implemented)

3. Logical Replication from Stream Tables

Consumer Personas & Needs

Gaps

Gap 1: No Row-Level Change Details in NOTIFY

Gap 2: FULL Refresh Is a Blind Spot

Gap 3: No Built-In Change Audit Table for STs

Gap 4: No Transactional Outbox

Gap 5: No External Push Integration

Approaches

Approach A: Enhanced NOTIFY with Change Summary

Approach B: Output Change Buffer Tables

Approach C: Change Feed via AFTER Triggers (Document the Pattern)

Approach D: pg_trickle as Logical Replication Publisher

Approach E: Outbox Table with Delivery Guarantees

Comparison Matrix

Recommendation

Layer 1: Enhanced NOTIFY (Approach A) — Do First

Layer 2: Change Feed Helper (Approach C) — Do Second

Layer 3: Logical Replication Docs (Approach D) — Document

Layer 4: Output Change Buffer (Approach B) — Evaluate Later

Integration with Blue-Green Deployment

Open Questions

PGXN

PostgreSQL Extension Network

Contents

Downstream Consumer Patterns for Stream Tables

Problem Statement

What Already Exists

1. LISTEN/NOTIFY Alerts

2. User Triggers on Stream Tables (Implemented)

3. Logical Replication from Stream Tables

Consumer Personas & Needs

Gaps

Gap 1: No Row-Level Change Details in NOTIFY

Gap 2: FULL Refresh Is a Blind Spot

Gap 3: No Built-In Change Audit Table for STs

Gap 4: No Transactional Outbox

Gap 5: No External Push Integration

Approaches

Approach A: Enhanced NOTIFY with Change Summary

Approach B: Output Change Buffer Tables

Approach C: Change Feed via AFTER Triggers (Document the Pattern)

Approach D: pg_trickle as Logical Replication Publisher

Approach E: Outbox Table with Delivery Guarantees

Comparison Matrix

Recommendation

Layer 1: Enhanced NOTIFY (Approach A) — Do First

Layer 2: Change Feed Helper (Approach C) — Do Second

Layer 3: Logical Replication Docs (Approach D) — Document

Layer 4: Output Change Buffer (Approach B) — Evaluate Later

Integration with Blue-Green Deployment

Open Questions

Related Work