Contents
REPORT_TIMESCALEDB.md — TimescaleDB Synergy & Lessons for pg_trickle
Date: 2026-03-01 Status: RESEARCH
1. Executive Summary
TimescaleDB and pg_trickle are complementary PostgreSQL extensions solving different problems — partitioned time-series storage vs incremental view maintenance. They don’t conflict at the storage layer and can coexist.
However, there is a concrete synergy opportunity: pg_trickle can provide rich IVM over hypertables where TimescaleDB’s Continuous Aggregates are limited (no complex JOINs, no subqueries, no window functions, no cross-table aggregation). Several TimescaleDB design patterns are also worth studying for adoption in pg_trickle.
This document catalogues: (1) where the projects complement each other, (2) specific synergy scenarios, (3) engineering challenges for combined use, (4) design ideas worth borrowing, and (5) a phased action plan.
2. Comparison
| TimescaleDB | pg_trickle | |
|---|---|---|
| Core problem | Efficient time-series storage & querying | Keeping materialized views fresh incrementally |
| Key mechanism | Hypertables — auto-partitioning by time, compression, retention policies | Differential dataflow (DBSP) — delta query derivation from operator trees |
| IVM approach | Continuous Aggregates — restricted to time_bucket GROUP BY on a single hypertable |
Stream Tables — broad SQL (JOINs, CTEs, window functions, subqueries, set operations) over any table |
| Change tracking | Invalidation log per hypertable chunk/time-bucket | Row-level triggers → WAL logical replication |
| Workload model | Append-heavy time-series (INSERT-dominant) | General OLTP (INSERT/UPDATE/DELETE) |
| Refresh model | Materialization policies on invalidated buckets | Scheduled or on-demand delta refresh over change buffers |
| Pipeline support | Cagg-on-cagg (hierarchical rollups) | Stream-table-on-stream-table DAG with demand-driven scheduling |
| Storage | Compressed columnar chunks for old data | Standard PostgreSQL heap tables |
| License | Apache-2.0 (community) / Timescale License (enterprise features) | Apache-2.0 |
3. Synergy Scenarios
3.1. Rich SQL over Hypertables
Continuous Aggregates are limited to GROUP BY time_bucket(...) on a single
hypertable with scalar aggregates. Users cannot:
- JOIN a hypertable with a dimension table (e.g.,
sensor_readings JOIN devices) - Use window functions (
ROW_NUMBER,LAG,LEAD) - Use subqueries or CTEs in the aggregate definition
- Combine data from multiple hypertables
- Use
DISTINCT ON,UNION,INTERSECT,EXCEPT
pg_trickle stream tables support all of these. A user could define:
SELECT pgtrickle.create_stream_table(
'enriched_hourly_readings',
$$
SELECT
time_bucket('1 hour', r.ts) AS bucket,
d.facility,
d.region,
avg(r.temperature) AS avg_temp,
max(r.temperature) - min(r.temperature) AS temp_spread,
count(*) AS reading_count
FROM sensor_readings r
JOIN devices d ON d.device_id = r.device_id
WHERE d.is_active
GROUP BY 1, 2, 3
$$,
schedule := '5 minutes'
);
This is impossible with a Continuous Aggregate alone.
3.2. Cross-Hypertable Aggregation
Organizations often have multiple hypertables (e.g., cpu_metrics,
memory_metrics, disk_metrics) that need to be combined for dashboards.
Continuous Aggregates can’t span tables. A pg_trickle stream table can JOIN
and aggregate across them.
3.3. Hierarchical Pipelines Mixing Both
A Continuous Aggregate could provide a pre-materialized time-bucketed rollup that feeds into a pg_trickle stream table for enrichment:
sensor_readings (hypertable)
└─▶ hourly_readings (Continuous Aggregate — fast time-bucket rollup)
└─▶ enriched_hourly_readings (pg_trickle stream table — JOIN with devices)
└─▶ regional_summary (pg_trickle stream table — further aggregation)
3.4. Real-Time Alerting on Time-Series Data
pg_trickle’s NOTIFY-based alerting combined with stream tables over
hypertables could enable threshold-based alerts that fire when an aggregated
metric crosses a boundary — without polling.
4. Engineering Challenges
4.1. Trigger-Based CDC on Hypertables — HIGH
Hypertables are the parent table; actual data lives in chunk child tables
(_timescaledb_internal._hyper_<id>_<chunk_id>_chunk). TimescaleDB redirects
INSERT operations to the appropriate chunk based on the time dimension.
Impact: Row-level AFTER INSERT triggers defined on the parent hypertable
do fire for inserts — TimescaleDB propagates triggers to chunks. However,
this behavior needs validation under:
- Chunk creation (new chunks appear as data arrives for new time ranges)
- Chunk compression (rows become read-only columnar data)
- Chunk decompression (for updates to compressed ranges)
- Chunk migration (move/copy between tablespaces)
Investigation needed:
- [ ] Verify trigger propagation works for newly created chunks
- [ ] Test that pg_current_wal_lsn() inside a trigger on a chunk returns a
sensible value
- [ ] Benchmark trigger overhead on high-throughput hypertable inserts (time-
series workloads often do 100K+ inserts/sec)
4.2. WAL-Based CDC on Hypertables — MEDIUM
Logical replication decoding sees DML on chunk tables, not the parent hypertable. pg_trickle’s WAL decoder would need to:
- Map chunk OIDs back to the parent hypertable OID
- Handle DDL from chunk management (CREATE TABLE for new chunks, ALTER for compression transitions)
- Ignore or correctly handle the internal metadata changes that TimescaleDB writes alongside user data
The mapping can be done via _timescaledb_catalog.chunk →
_timescaledb_catalog.hypertable lookups, but this adds a dependency on
TimescaleDB’s internal schema.
4.3. Compression Interactions — MEDIUM
When TimescaleDB compresses a chunk: 1. Data is READ from the uncompressed chunk 2. Compressed data is WRITTEN to a compressed chunk (internal format) 3. The uncompressed chunk is TRUNCATED
Steps 1–3 appear in WAL as normal DML. The DELETE + INSERT pattern would be misinterpreted by pg_trickle’s CDC as real data changes, triggering a spurious full delta computation.
Mitigation options:
- Detect compression operations via _timescaledb_catalog metadata and skip
the corresponding WAL events
- Use TimescaleDB’s compression hooks (if exposed) to suppress CDC during
compression
- Only track uncompressed (recent) chunks — this naturally aligns with the
time-series pattern where only recent data changes
4.4. time_bucket() Function in Delta Queries — LOW
time_bucket() is a TimescaleDB function, not a core PostgreSQL function. When
pg_trickle parses a defining query containing time_bucket(), it needs to:
- Recognize it as a deterministic, immutable function (safe for IVM)
- Not attempt to “differentiate” through it — treat it as a scalar transform
pg_trickle already handles arbitrary immutable functions as pass-through in delta queries, so this should work without changes. Needs validation.
4.5. Continuous Aggregate Internals as Source — LOW
If a user defines a stream table over a Continuous Aggregate’s materialized view, pg_trickle needs to handle the fact that the underlying object is a materialized hypertable (with its own chunk structure), not a plain table. Triggers on the cagg’s materialized hypertable would capture the deltas that TimescaleDB’s materializer produces.
5. Design Ideas Worth Borrowing
5.1. Time-Bucketed Invalidation Tracking
What TimescaleDB does: Instead of tracking every changed row, the invalidation log records which time-bucket ranges were modified:
invalidation_log: (hypertable_id, lowest_modified_value, greatest_modified_value)
On refresh, only the affected time buckets are recomputed — not the entire table, and not individual row deltas.
Applicability to pg_trickle: For source tables with a time/sequence column used in GROUP BY, pg_trickle could maintain a coarser invalidation log alongside (or instead of) per-row change buffers. This would dramatically reduce change buffer size for time-series-style workloads.
Complexity: HIGH — requires detecting temporal grouping patterns in the defining query and implementing range-based invalidation in the refresh engine.
Priority: Phase 3+ (optimization, not correctness)
5.2. Real-Time Aggregation Mode
What TimescaleDB does: Continuous Aggregates support a “real-time” mode that at query time combines: - The materialized (potentially stale) aggregate data - A live query over unmaterialized recent data (since last refresh)
The result is always-fresh reads without requiring a refresh.
Applicability to pg_trickle: A stream table could expose a view that UNIONs: 1. The materialized storage table 2. A delta query over the unflushed change buffer
-- Conceptual: auto-generated view
CREATE VIEW st_realtime AS
SELECT * FROM st_materialized
UNION ALL
SELECT * FROM (/* delta query over change buffer */);
Complexity: MEDIUM-HIGH — the delta query derivation already exists (it’s the refresh path). The challenge is making it performant at query time without a full refresh cycle, and handling deduplication correctly for non-append-only workloads (UPDATEs/DELETEs require MERGE-like semantics, not UNION ALL).
Priority: Phase 2 — high user value; aligns with the “freshness without latency” goal.
See also: PLAN_TRANSACTIONAL_IVM.md for transactionally updated views (immediate IVM), which overlaps with this idea.
5.3. Append-Only Fast Path
What TimescaleDB does: Time-series workloads are insert-only. TimescaleDB optimizes for this by not tracking deletes/updates in the invalidation log.
Applicability to pg_trickle: When pg_trickle detects that a source table is append-only (configurable hint or detected via trigger — no UPDATE/DELETE events observed), the delta query can be simplified:
- JOINs: Skip the bilinear expansion. For
A JOIN Bwhere A is append-only and B is static, the delta is justΔA JOIN B(one term instead of three). - Aggregates: Skip maintaining DELETE counters in auxiliary state.
- Change buffer: Skip the
opcolumn (always ‘I’); use a simpler schema.
Complexity: LOW-MEDIUM — this is a specialization of existing paths.
Priority: Phase 2 — significant performance win for time-series use cases.
5.4. Coarse-Grained Change Tracking for Partitioned Tables
What TimescaleDB does: Invalidation is tracked per-chunk (partition), not per-row. Chunks map to time ranges, so “which chunks changed?” is a O(1) check per chunk.
Applicability to pg_trickle: For PostgreSQL native partitioned tables (not just hypertables), pg_trickle could maintain a partition-level change flag:
- On first change to a partition since last refresh, set a dirty flag
- On refresh, only scan change buffers for dirty partitions
- Clear dirty flags after refresh
This doesn’t change the delta computation — it’s a filter that avoids scanning empty change buffers.
Complexity: LOW — partition OIDs are available from pg_inherits.
Priority: Phase 2 — especially valuable when combined with TimescaleDB.
5.5. Declarative Policy Framework
What TimescaleDB does: Policies are first-class objects:
SELECT add_continuous_aggregate_policy('hourly_readings',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
Policies have built-in monitoring (timescaledb_information.job_stats), retry
logic, exponential backoff, and can be paused/resumed.
Applicability to pg_trickle: pg_trickle already has schedule and status
management via alter_stream_table. The lesson is more about UX:
- Expose refresh policies as queryable objects (already done via
pgtrickle.pgt_stream_tables) - Add retry/backoff configuration (partially done via
consecutive_errors) - Consider a
pgtrickle.add_refresh_policy()wrapper that mirrors the TimescaleDB ergonomics for users coming from that ecosystem
Priority: Phase 3 — ergonomic polish.
5.6. Chunk-Aware Retention
What TimescaleDB does: add_retention_policy() automatically drops chunks
older than a threshold.
Applicability to pg_trickle: Stream table change buffers grow over time. A built-in retention policy for change buffers (drop change buffer entries older than N refreshes or N hours) would prevent unbounded growth. This is partially handled by the post-refresh cleanup, but a dedicated policy would be more robust for error scenarios where refreshes fail repeatedly.
Priority: Phase 3.
6. Action Plan
Phase 1 — Validation (1–2 weeks)
Goal: Confirm that pg_trickle works at all with hypertable source tables.
| # | Task | Effort |
|---|---|---|
| 1.1 | Set up a test environment with both extensions on PG 18 | 2h |
| 1.2 | Create a hypertable and define a pg_trickle stream table over it (simple GROUP BY) | 2h |
| 1.3 | Verify trigger-based CDC fires correctly on hypertable inserts | 2h |
| 1.4 | Test with chunk creation (insert data spanning multiple time ranges) | 2h |
| 1.5 | Test with time_bucket() in defining query |
1h |
| 1.6 | Measure trigger overhead on high-throughput inserts (10K–100K rows/sec) | 4h |
| 1.7 | Document findings and any blockers | 2h |
Deliverable: Test report confirming basic compatibility or listing blockers.
Phase 2 — Core Optimizations (2–4 weeks)
Goal: Implement the highest-value patterns learned from TimescaleDB.
| # | Task | Effort | Depends on |
|---|---|---|---|
| 2.1 | Append-only source hint (source_mode := 'append_only') |
3d | — |
| 2.2 | Simplified delta queries for append-only JOINs | 3d | 2.1 |
| 2.3 | Partition-level dirty tracking for partitioned/hyper tables | 2d | Phase 1 |
| 2.4 | Investigate real-time aggregation mode (design doc) | 3d | — |
| 2.5 | E2E test suite: pg_trickle + TimescaleDB scenarios | 3d | Phase 1 |
Phase 3 — Advanced Integration (4–8 weeks)
Goal: Handle edge cases and polish the combined experience.
| # | Task | Effort | Depends on |
|---|---|---|---|
| 3.1 | WAL decoder: map chunk OIDs → parent hypertable OID | 3d | Phase 1 |
| 3.2 | WAL decoder: filter compression-related DML | 2d | 3.1 |
| 3.3 | Stream table over Continuous Aggregate source | 3d | Phase 1 |
| 3.4 | Time-bucketed invalidation tracking (design + prototype) | 2w | 2.4 |
| 3.5 | Documentation: “Using pg_trickle with TimescaleDB” tutorial | 2d | Phase 2 |
| 3.6 | Refresh policy ergonomics (add_refresh_policy wrapper) |
2d | — |
| 3.7 | Change buffer retention policy | 2d | — |
7. Non-Goals
- Replacing Continuous Aggregates. For simple
time_bucketGROUP BY on a single hypertable, Continuous Aggregates are faster and simpler. pg_trickle targets the cases caggs can’t handle. - Depending on TimescaleDB. All TimescaleDB integration must be optional.
pg_trickle detects TimescaleDB at runtime (
SELECT 1 FROM pg_extension WHERE extname = 'timescaledb') and enables optimizations conditionally. - Modifying TimescaleDB internals. All integration works through public APIs and catalog tables.
8. Open Questions
Trigger propagation on new chunks — Does TimescaleDB propagate existing parent-table triggers to chunks created after the trigger was defined? (Likely yes, but needs verification.)
Compression hook — Is there a way to detect that a chunk is being compressed (hook, event trigger, catalog flag) so CDC can ignore the resulting DML?
Cagg materialized hypertable OID stability — When a Continuous Aggregate is refreshed, does the materialized hypertable’s OID remain stable? (Almost certainly yes, but worth confirming for CDC registration.)
time_bucketimmutability — Istime_bucket()markedIMMUTABLEin the catalog? pg_trickle requires source functions to be immutable for correct IVM. (Likely yes for fixed-interval overloads; the timezone-aware overloads may beSTABLE.)Licensing — Some TimescaleDB features (e.g., compression, continuous aggregates) are under the Timescale License, not Apache-2.0. Does depending on their catalog tables or behavior create any licensing concerns for pg_trickle? (Likely no — pg_trickle would only query public catalog views, not link to TSL code.)
9. References
- TimescaleDB documentation — Continuous Aggregates
- TimescaleDB documentation — Hypertables
- TimescaleDB documentation — Compression
- TimescaleDB source — Invalidation log
- DBSP paper (Budiu et al., 2022)
- pg_trickle ARCHITECTURE.md
- pg_trickle PLAN_TRANSACTIONAL_IVM.md
- pg_trickle PLAN_CITUS.md — similar extension-compat plan