Streaming to Kafka Without Kafka Expertise

Streaming to Kafka Without Kafka Expertise

How pgtrickle-relay bridges stream table deltas to external systems

You have pg_trickle maintaining stream tables in PostgreSQL. Now your analytics team wants those deltas in Kafka. Your mobile team wants webhook notifications. Your data science team wants events in NATS for a real-time ML feature store.

The traditional approach: write a Kafka producer in Python, poll the database, serialize the deltas, handle offsets, deal with exactly-once semantics, set up monitoring, and maintain it forever.

The pg_trickle approach: run a single binary.

What pgtrickle-relay Does

pgtrickle-relay is a standalone Rust binary that reads from pg_trickle’s outbox tables and writes to external messaging systems. It’s not a library, not a framework, not an SDK. It’s a process you run next to your PostgreSQL instance.

PostgreSQL                              External Systems
┌─────────────────┐                     ┌──────────────┐
│ stream table    │                     │ Kafka topic  │
│   ↓ delta       │                     └──────▲───────┘
│ outbox table    │──→ pgtrickle-relay ──→     │
│ (transactional) │         │                  │
└─────────────────┘         │           ┌──────▲───────┐
                            ├──→        │ NATS subject │
                            │           └──────────────┘
                            │           ┌──────────────┐
                            └──→        │ HTTP webhook  │
                                        └──────────────┘

Supported sinks: Kafka, NATS JetStream, SQS, RabbitMQ, Redis Streams, HTTP webhooks.

Supported sources (for the inbox direction): the same list, reversed. External events come in, land in an inbox table, and stream tables can read from them.

Setup

1. Enable the outbox on a stream table

SELECT pgtrickle.enable_outbox('revenue_by_region');

This creates an outbox table that captures every delta produced by revenue_by_region. Each refresh cycle that produces non-empty changes writes an outbox row in the same transaction as the MERGE.

2. Configure the relay

# relay.toml
[global]
postgres_url = "postgres://user:pass@localhost/mydb"
metrics_bind = "0.0.0.0:9090"

[[pipeline]]
name = "revenue-to-kafka"
source = { type = "outbox", stream_table = "revenue_by_region" }
sink = { type = "kafka", brokers = "kafka:9092", topic = "revenue-deltas" }

[[pipeline]]
name = "orders-to-nats"
source = { type = "outbox", stream_table = "order_view" }
sink = { type = "nats", url = "nats://localhost:4222", subject = "orders.>" }

[[pipeline]]
name = "alerts-to-webhook"
source = { type = "outbox", stream_table = "fraud_alerts" }
sink = { type = "webhook", url = "https://hooks.example.com/fraud", method = "POST" }

3. Run it

pgtrickle-relay --config relay.toml

That’s it. No Kafka consumer code. No NATS client library. No webhook retry logic.

What Gets Sent

Each outbox message is a JSON envelope containing:

{
  "stream_table": "revenue_by_region",
  "sequence": 42,
  "timestamp": "2026-04-27T10:15:03.412Z",
  "delta": {
    "inserted": [
      {"region": "europe", "day": "2026-04-27", "revenue": 150200.50, "order_count": 1203}
    ],
    "deleted": [
      {"region": "europe", "day": "2026-04-27", "revenue": 149800.00, "order_count": 1201}
    ]
  },
  "metadata": {
    "refresh_mode": "DIFFERENTIAL",
    "refresh_duration_ms": 12,
    "delta_rows": 2
  }
}

The delta is the exact set of rows that changed in the stream table — rows removed (old values) and rows added (new values). For an aggregate that was updated, the old aggregate value is in deleted and the new value is in inserted.

Consumers don’t need to know about pg_trickle, PostgreSQL, or IVM. They receive a JSON message with the before/after state of the affected rows. They can build their own materialization from the delta stream.

Subject Routing

For NATS and Kafka, you can route messages to different subjects/topics based on the delta content:

[[pipeline]]
name = "regional-revenue"
source = { type = "outbox", stream_table = "revenue_by_region" }
sink = {
  type = "nats",
  url = "nats://localhost:4222",
  subject_template = "revenue.{{ region }}"
}

A delta for the europe region goes to revenue.europe. A delta for asia goes to revenue.asia. Consumers subscribe to only the regions they care about.

High Availability

In production you run multiple relay instances. They coordinate via PostgreSQL advisory locks:

[global]
ha_group = "relay-primary"

One instance acquires the advisory lock and becomes the active leader. The others are standby. If the leader crashes, a standby acquires the lock within seconds and continues from where the leader left off.

The outbox table stores the consumer offset. A new leader reads the last committed offset and resumes. No messages are lost. Some messages may be delivered twice during failover — consumers should be idempotent (which they should be anyway).

The Inbox Direction

The relay also works in reverse. External events can flow into PostgreSQL:

[[pipeline]]
name = "payments-from-kafka"
source = { type = "kafka", brokers = "kafka:9092", topic = "payment-events", group_id = "pgtrickle-inbox" }
sink = { type = "inbox", inbox_name = "payment_events" }

Events land in pgtrickle.inbox_payment_events. You can build stream tables on top of the inbox:

-- Create the inbox
SELECT pgtrickle.create_inbox('payment_events');

-- Stream table over incoming payment events
SELECT pgtrickle.create_stream_table(
    'payment_summary',
    $$SELECT
        customer_id,
        SUM((payload->>'amount')::numeric) AS total_paid,
        COUNT(*) AS payment_count
      FROM pgtrickle.inbox_payment_events
      WHERE processed = false
      GROUP BY customer_id$$,
    schedule => '2s', refresh_mode => 'DIFFERENTIAL'
);

Events from Kafka are now queryable as a PostgreSQL table, with incremental aggregation on top.

Monitoring

The relay exposes Prometheus metrics at /metrics:

# Messages delivered successfully
pgtrickle_relay_messages_delivered_total{pipeline="revenue-to-kafka"} 12847

# Delivery latency (histogram)
pgtrickle_relay_delivery_duration_seconds_bucket{pipeline="revenue-to-kafka",le="0.01"} 12500

# Consumer lag (messages pending)
pgtrickle_relay_consumer_lag{pipeline="revenue-to-kafka"} 3

# Errors
pgtrickle_relay_delivery_errors_total{pipeline="revenue-to-kafka"} 0

And a health endpoint at /health that returns the status of each pipeline.

When to Use the Relay vs. Direct Outbox Polling

If your consumer is a PostgreSQL-native application (another service that queries the database), use pgtrickle.poll_outbox() directly. No relay needed.

If your consumer is an external system that speaks Kafka, NATS, HTTP, or any other messaging protocol — use the relay. It handles serialization, delivery, retries, offset tracking, and HA. Writing a custom consumer for each sink is the kind of infrastructure work that seems small and grows into a maintenance burden.

The relay is also useful when you need fan-out: one stream table’s deltas going to multiple sinks. Each pipeline runs independently, with its own offset tracking.

PGXN

PostgreSQL Extension Network

Contents