Contents
v0.15.0 — External Benchmarks, Bulk API, and dbt Hub Preparation
Full technical details: v0.15.0.md-full.md
Status: ✅ Released | Scope: Medium (~4 weeks)
Validation against the Nexmark streaming benchmark, a bulk API for creating many stream tables at once, parser architecture improvements, watermark hold-back for late-arriving data, and preparation for listing on the dbt Hub.
What problem does this solve?
TPC-H validates analytical query correctness, but streaming workloads (event streams, IoT data, activity feeds) have different patterns. Nexmark is the industry-standard streaming benchmark. Operators managing dozens of stream tables needed a way to create them in bulk. And dbt users wanted to find and install the pg_trickle dbt package from the standard dbt Hub marketplace.
Nexmark Streaming Benchmark
Nexmark is a benchmark suite designed specifically for streaming systems, with queries modelling auction activity: bids, auctions, and persons. It tests patterns like:
- Event-time windowing (aggregate over a sliding time window)
- Join with late-arriving events (an auction result arriving after the bid)
- Top-N per category (highest bids per auction)
pg_trickle’s differential engine is now validated against the Nexmark query set, demonstrating that it handles streaming event patterns — not just analytical batch queries.
Bulk Create API
pgtrickle.create_stream_tables_from_json(definitions) accepts a JSON
array of stream table definitions and creates all of them in a single call.
This is useful for:
- Infrastructure-as-code deployments
- dbt post-run hooks that create many stream tables from model definitions
- Migration scripts that need to set up a complete stream table configuration
Parser Modularisation
The internal query parser and differential SQL generator — which analyses a SQL query and produces the incremental update logic — was split into four focused modules:
types.rs— the abstract syntax representationvalidation.rs— checks whether a query is supportedrewrites.rs— SQL transformation passessublinks.rs— subquery extraction logic
This makes the parser easier to extend and reduces the risk that a change to one aspect of the parser accidentally affects another.
Watermark Hold-Back for Late-Arriving Data
In streaming workloads, events sometimes arrive late — a sensor reading from 2 minutes ago arrives now. If the stream table for that time window has already been refreshed and “closed”, the late event would be missed.
Watermark hold-back allows you to configure a delay on a stream table’s watermark, keeping the window open for late events. For example, a 5-minute hold-back means the stream table will not close a time window until 5 minutes after the window’s end time, accommodating events up to 5 minutes late.
Delta Cost Estimation
A new cost estimator predicts how expensive a differential refresh will be before running it, by examining the change buffer size and the complexity of the defining query. AUTO mode uses this estimate to pre-emptively choose FULL refresh when the differential is predicted to be slower, rather than waiting to observe the actual performance.
dbt Hub Preparation
The dbt-pgtrickle package was prepared for submission to the dbt Hub — the official package registry for dbt. This includes package metadata, documentation, and integration tests that run as part of the dbt Hub certification process.
ORM Integration Guides
Documentation guides for using pg_trickle with common ORMs:
- SQLAlchemy (Python)
- ActiveRecord (Ruby / Rails)
- Diesel (Rust)
- Prisma (Node.js / TypeScript)
Each guide shows how to query stream tables from the ORM and how to trigger refreshes from application code.
Scope
v0.15.0 broadens the validation coverage to streaming workloads (Nexmark), improves the ergonomics of bulk deployments, and prepares the dbt integration for the wider dbt ecosystem. The watermark hold-back feature addresses a fundamental challenge in streaming analytics: late-arriving data.