Contents
Gap Analysis: pg_trickle vs. Feldera — Core SQL IVM Engine (PostgreSQL Features Only)
Date: 2026-02-28 pg_trickle version: 0.1.2 (PostgreSQL 18 extension, Rust/pgrx) Feldera version: v0.255.0 (standalone incremental computation engine, Rust/Java) Scope: Core SQL incremental view maintenance engine, limited to SQL features available in PostgreSQL. Excludes connectors, deployment, operational tooling, APIs, time-series extensions, Feldera-specific SQL constructs (ASOF JOIN, PIVOT, QUALIFY, WITHIN DISTINCT, ARG_MIN/ARG_MAX, COUNTIF, lambda expressions, DECLARE RECURSIVE VIEW, CROSS/OUTER APPLY), and non-IVM features.
Executive Summary
Both pg_trickle and Feldera implement incremental view maintenance (IVM) with theoretical roots in DBSP (Budiu et al., VLDB 2023). Feldera is the commercial DBSP implementation by its original authors; pg_trickle applies DBSP’s differentiation rules inside PostgreSQL’s executor.
This analysis compares only the core IVM engine for SQL constructs that exist in PostgreSQL’s SQL dialect: which constructs each system can incrementally maintain, how efficiently, and with what correctness guarantees.
Key findings:
- Aggregate functions: pg_trickle supports 39+ vs Feldera’s ~20. pg_trickle covers all PG statistical, JSON, and ordered-set aggregates that Feldera lacks.
- Window functions: pg_trickle has full support; Feldera restricts ROW_NUMBER/RANK/DENSE_RANK to TopK patterns only.
- Recursion: Feldera has truly incremental fixed-point (DBSP native); pg_trickle uses recomputation-diff (correct but less efficient).
- Set operations: pg_trickle supports all 6 variants; Feldera lacks EXCEPT ALL and INTERSECT ALL.
- Incremental efficiency: Feldera maintains true Z-set weights and in-memory operator state. pg_trickle generates delta SQL executed by PG’s planner — no persistent operator state.
- Both systems handle joins, subqueries, and CTEs comparably.
Summary Table
| SQL Feature | Feldera | pg_trickle | Advantage |
|---|---|---|---|
| Aggregate functions | ~20 | 39+ | pg_trickle |
| Window functions | Partial (TopK only for ranking) | Full | pg_trickle |
| Inner / outer joins | ✅ | ✅ | Tied |
| Semi-join / anti-join | ✅ | ✅ (dedicated operators) | Tied |
| Correlated subqueries | ✅ | ✅ | Tied |
| LATERAL | ✅ | ✅ | Tied |
| Scalar subqueries | ✅ | ✅ | Tied |
| Set operations (UNION, EXCEPT, INTERSECT) | 4 of 6 | All 6 | pg_trickle |
| Non-recursive CTEs | ✅ | ✅ | Tied |
| Recursive queries (WITH RECURSIVE) | ✅ (incremental fixed-point) | ✅ (recomputation-diff) | Feldera (efficiency) |
| GROUPING SETS / CUBE / ROLLUP | ✅ | ✅ (auto-rewritten) | Tied |
| DISTINCT / DISTINCT ON | ✅ / ❌ | ✅ / ✅ (auto-rewritten) | pg_trickle |
| Z-set model (true weights) | ✅ (integer weights, abelian groups) | ❌ (binary I/D + auxiliary counters) | Feldera |
| Persistent operator state | ✅ (in-memory, NVMe spill) | ❌ (stateless delta SQL per refresh) | Feldera |
| Formal correctness proofs | ✅ (Lean) | ❌ (empirical: property tests, TPC-H) | Feldera |
| SQL dialect | Calcite-based | Native PostgreSQL | pg_trickle (PG compat) |
Detailed Comparison
1. Incremental Computation Model
| Aspect | Feldera (DBSP) | pg_trickle (DVM) |
|---|---|---|
| Theoretical basis | DBSP: Z-sets over abelian groups, lifting transform, integration/differentiation operators | DBSP-inspired: per-operator differentiation rules, binary delta model |
| Z-set weights | True integer weights in ℤ (bag semantics, composable) | Binary actions ('I'/'D'), with __pgt_count auxiliary for aggregates |
| Operator state | Persistent in-memory state per operator; integration operator (I) maintains running sums | No persistent state; “current state” = stream table contents; delta SQL reads snapshots |
| Execution | Compile SQL → DBSP dataflow → continuous Rust circuit | Parse SQL → DVM operator tree → generate delta SQL CTEs → PG executor |
| Processing model | Continuous micro-batch (each input change triggers incremental output) | Periodic refresh (scheduler triggers delta SQL execution) |
| Optimizer | Calcite (rule-based → DBSP circuit) | PostgreSQL planner (cost-based, mature) |
Gap for pg_trickle: No true Z-set weight propagation through the operator tree. The binary I/D model works correctly (verified empirically) but is theoretically less general than DBSP’s full abelian group model. No persistent operator state — each refresh re-reads current table contents rather than maintaining incremental running sums.
Gap for Feldera: Relies on Calcite’s optimizer, which lacks PostgreSQL’s cost-based join ordering, adaptive execution, and decades of optimization heuristics. SQL dialect is Calcite-based, not PostgreSQL — queries may need modification.
2. Aggregate Functions
| Function | Feldera | pg_trickle | Incremental Strategy |
|---|---|---|---|
| COUNT(*) / COUNT(expr) | ✅ | ✅ | Both: algebraic / linear |
| SUM | ✅ | ✅ | Both: algebraic / linear |
| AVG | ✅ | ✅ | Both: via SUM/COUNT decomposition |
| MIN / MAX | ✅ | ✅ | Feldera: O(D log M); pg_trickle: rescan on extremum delete |
| STDDEV / STDDEV_POP / STDDEV_SAMP | ✅ | ✅ | Feldera: linear (int/decimal); pg_trickle: group-rescan |
| BIT_AND / BIT_OR / BIT_XOR | ✅ | ✅ | Feldera: non-linear O(N); pg_trickle: group-rescan |
| BOOL_AND / BOOL_OR / EVERY / SOME | ✅ | ✅ | Feldera: O(D log M); pg_trickle: group-rescan |
| ARRAY_AGG | ✅ | ✅ | Both: expensive O(M) |
| STRING_AGG | ❌ | ✅ | pg_trickle: group-rescan |
| JSON_AGG / JSONB_AGG | ❌ | ✅ | pg_trickle: group-rescan |
| JSON_OBJECT_AGG / JSONB_OBJECT_AGG | ❌ | ✅ | pg_trickle: group-rescan |
| MODE | ❌ | ✅ | pg_trickle: group-rescan (ordered-set) |
| PERCENTILE_CONT / PERCENTILE_DISC | ❌ | ✅ | pg_trickle: group-rescan (ordered-set) |
| CORR / COVAR_POP / COVAR_SAMP | ❌ | ✅ | pg_trickle: group-rescan |
| REGR_* (11 functions) | ❌ | ✅ | pg_trickle: group-rescan |
| ANY_VALUE (PG 16+) | ❌ | ✅ | pg_trickle: group-rescan |
| JSON_ARRAYAGG / JSON_OBJECTAGG (PG 16+) | ❌ | ✅ | pg_trickle: group-rescan |
| FILTER (WHERE) clause | ✅ | ✅ | Feldera: makes agg non-linear |
| WITHIN GROUP (ORDER BY) | ❌ | ✅ | pg_trickle: ordered-set aggregates |
| Total built-in aggregates | ~20 | 39+ |
Efficiency comparison: Feldera classifies aggregates into linear (O(D)), efficient (O(D log M)), and non-linear/expensive (O(M) or O(N)). pg_trickle classifies into algebraic (fully differential, O(D)), semi-algebraic (rescan on extremum delete), and group-rescan (re-aggregate affected groups). For the 5 shared algebraic aggregates (COUNT, SUM, AVG, MIN, MAX), both systems are comparably efficient. For the remaining aggregates, Feldera’s approach is generally more efficient because it maintains per-group state in memory, while pg_trickle re-aggregates affected groups via SQL.
Gap for pg_trickle: No significant gap for PostgreSQL-native aggregates — pg_trickle covers all built-in PG aggregate functions.
Gap for Feldera: Missing 20+ aggregate functions available in PostgreSQL (STRING_AGG, all JSON aggregates, statistical/regression aggregates, ordered-set aggregates). No WITHIN GROUP.
3. Window Functions
| Feature | Feldera | pg_trickle |
|---|---|---|
| ROW_NUMBER | ✅ (TopK pattern only) | ✅ (full) |
| RANK / DENSE_RANK | ✅ (TopK pattern only) | ✅ (full) |
| NTILE | ❌ | ✅ |
| FIRST_VALUE / LAST_VALUE | ✅ (UNLIMITED RANGE only) | ✅ |
| LAG / LEAD | ✅ | ✅ |
| SUM / AVG / COUNT / MIN / MAX OVER | ✅ | ✅ |
| Frame clauses (ROWS/RANGE/GROUPS) | ✅ (constant bounds required) | ✅ (full) |
| PARTITION BY recomputation | ✅ | ✅ |
| Named WINDOW clauses | ? | ✅ |
| Arbitrary ranking functions | ❌ (must be in TopK subquery) | ✅ |
| Window in recursive queries | ❌ | ✅ |
TopK restriction explained: Feldera’s ROW_NUMBER, RANK, and DENSE_RANK only
work when the compiler detects a TopK pattern: a subquery with the ranking
function, filtered by rn < K in the outer query. General use (e.g., numbering
all rows, pagination, gap-and-island detection) is not supported.
Gap for pg_trickle: None for PostgreSQL-native window function features.
Gap for Feldera: Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) are restricted to TopK patterns. This is a significant limitation for analytical queries that use ranking for pagination, deduplication, or gap-and-island analysis. No NTILE. FIRST_VALUE/LAST_VALUE limited to UNLIMITED RANGE. No window functions in recursive query bodies.
4. Joins
| Feature | Feldera | pg_trickle |
|---|---|---|
| Inner join | ✅ | ✅ |
| LEFT / RIGHT / FULL OUTER | ✅ | ✅ |
| CROSS JOIN | ✅ | ✅ |
| NATURAL JOIN | ✅ | ✅ |
| Self-join | ✅ | ✅ |
| Non-equi join (theta) | ✅ | ✅ |
| Multi-condition outer join | ✅ | ✅ |
Delta rule comparison: Both use the bilinear join decomposition from DBSP (Δ(A ⋈ B) = ΔA ⋈ B + A ⋈ ΔB + ΔA ⋈ ΔB). Feldera maintains join state in memory; pg_trickle generates SQL that reads current table snapshots.
No significant gap for either system on PostgreSQL-native join types.
5. Subqueries
| Feature | Feldera | pg_trickle |
|---|---|---|
| Correlated subqueries | ✅ | ✅ |
| EXISTS / NOT EXISTS | ✅ | ✅ |
| IN / NOT IN (subquery) | ✅ | ✅ (semi-join / anti-join operators) |
| Scalar subquery in SELECT | ✅ | ✅ |
| Scalar subquery in WHERE | ✅ | ✅ (auto-rewritten to CROSS JOIN) |
| LATERAL subquery | ✅ | ✅ |
| LATERAL SRF (UNNEST, etc.) | ✅ | ✅ |
| ALL (subquery) | ? | ✅ (anti-join rewrite) |
| Subqueries in OR | ? | ✅ (auto-rewritten to UNION) |
No significant gap for either system on subqueries.
6. CTEs & Recursion
| Feature | Feldera | pg_trickle |
|---|---|---|
| Simple CTE (WITH) | ✅ | ✅ |
| Multi-reference CTE | ✅ | ✅ (shared delta) |
| Chained CTEs | ✅ | ✅ |
| Recursive queries (WITH RECURSIVE) | ✅ (incremental fixed-point, O(ΔR) per step) | ✅ (recomputation-diff) |
| Operators in recursive body | Most (no window functions) | All |
| Recursion syntax | Non-standard (DECLARE RECURSIVE VIEW) | Standard SQL (WITH RECURSIVE) |
This is the largest theoretical gap. Feldera implements DBSP’s native fixed-point iteration with the z⁻¹ delay operator — when input changes, only the affected portion of the recursive computation is re-evaluated (e.g., for transitive closure, only paths involving changed edges are recomputed). pg_trickle uses recomputation-diff: it re-executes the full recursive query and anti-joins the result against the current stream table to produce deltas. This is correct but scales as O(|result|) rather than O(|Δ|).
Gap for pg_trickle: Recursion is not truly incremental — recomputation-diff re-executes the full recursive query each refresh. For large recursive structures (transitive closure of million-edge graphs), this is much slower than Feldera’s approach.
Gap for Feldera: Non-standard recursion syntax (DECLARE RECURSIVE VIEW instead of WITH RECURSIVE). Window functions not supported in recursive bodies.
7. Set Operations
| Operation | Feldera | pg_trickle |
|---|---|---|
| UNION ALL | ✅ | ✅ |
| UNION (DISTINCT) | ✅ | ✅ |
| EXCEPT (DISTINCT) | ✅ | ✅ |
| EXCEPT ALL | ❌ | ✅ |
| INTERSECT (DISTINCT) | ✅ | ✅ |
| INTERSECT ALL | ❌ | ✅ |
Gap for Feldera: No EXCEPT ALL or INTERSECT ALL. These are less common but needed for correct multiset semantics in some analytical queries.
8. DISTINCT & Grouping
| Feature | Feldera | pg_trickle |
|---|---|---|
| SELECT DISTINCT | ✅ | ✅ |
| DISTINCT ON (expr, …) | ❌ | ✅ (auto-rewritten to ROW_NUMBER) |
| GROUP BY | ✅ | ✅ |
| GROUPING SETS | ✅ | ✅ (auto-rewritten to UNION ALL) |
| CUBE | ✅ | ✅ (auto-rewritten via GROUPING SETS) |
| ROLLUP | ✅ | ✅ (auto-rewritten via GROUPING SETS) |
| GROUPING() function | ✅ | ✅ |
| GROUP BY DISTINCT | ✅ | ? |
| HAVING | ✅ | ✅ |
Gap for Feldera: No DISTINCT ON (PostgreSQL extension, commonly used for “latest row per group” patterns).
9. Incremental Efficiency by Operator
| Operator | Feldera | pg_trickle |
|---|---|---|
| Filter (WHERE) | O(D) — linear, self-incremental | O(D) — delta passthrough |
| Project (SELECT) | O(D) — linear, self-incremental | O(D) — delta passthrough |
| Inner Join | O(D × state) — maintains join index | O(D × snapshot) — reads current tables via SQL |
| Outer Join | O(D × state) — maintains null-padding state | O(D × snapshot) — 8-part delta for FULL OUTER |
| Aggregate (algebraic) | O(D) — linear, in-memory counters | O(D) — algebraic rewrite with __pgt_count |
| Aggregate (group-rescan) | O(M) — re-aggregate modified groups in memory | O(M) — re-aggregate via SQL LEFT JOIN back |
| DISTINCT | O(D log N) — maintains count per distinct value | O(D) — handled via GROUP BY + HAVING count |
| UNION ALL | O(D) — passthrough | O(D) — passthrough |
| Window function | O(D × partition) — recompute affected partitions | O(D × partition) — recompute via SQL |
| Recursive CTE | O(Δ) — incremental fixed-point | O(result) — recomputation-diff |
Key difference: Feldera maintains per-operator state in memory (or spilled to NVMe), allowing O(Δ) incremental updates. pg_trickle generates SQL that reads current table snapshots — the PostgreSQL planner optimizes this, but there’s no persistent operator state between refreshes. Feldera’s approach is theoretically more efficient for joins and recursion; pg_trickle’s approach benefits from PG’s mature cost-based optimizer and avoids memory management.
10. Correctness & Verification
| Aspect | Feldera | pg_trickle |
|---|---|---|
| Formal proof | ✅ DBSP theorems proven in Lean | ❌ |
| Property-based testing | ✅ | ✅ (assert: Contents(ST) = Q(DB) after each mutation) |
| TPC-H validation | ? | ✅ (22-query suite, 20/22 create, 15/22 deterministic) |
| Consistency guarantee | Strongly consistent (provable) | Empirically verified (1,300+ tests) |
| Z-set correctness | Machine-checked chain rule, cycle rule, bilinear decomposition | Manual translation of DBSP rules, verified by tests |
| Recursive correctness | Fixed-point convergence proven | Recomputation-diff trivially correct (full recompute) |
Gap for pg_trickle: No formal proof. The per-operator differentiation rules are direct translations of DBSP but verified empirically rather than formally.
Gap for Feldera: The formal proofs cover the DBSP theory, but the SQL-to-DBSP compilation (via Calcite) is not formally verified — bugs in the SQL compiler could violate the theoretical guarantees.
Features Unique to Each System (IVM Engine, PostgreSQL Features Only)
Feldera-only
| # | Feature | Impact |
|---|---|---|
| 1 | Incremental recursive fixed-point | O(Δ) recursion vs O(result) recomputation |
| 2 | True Z-set weights (integer, abelian groups) | Theoretically general multiset semantics |
| 3 | Persistent operator state (in-memory + NVMe spill) | Avoids re-reading snapshots on each refresh |
| 4 | Formal DBSP proofs (Lean) | Machine-checked correctness |
pg_trickle-only
| # | Feature | Impact |
|---|---|---|
| 1 | 20+ additional aggregate functions | STRING_AGG, JSON aggs, statistical, ordered-set, regression |
| 2 | Full window function support (no TopK restriction) | Arbitrary ROW_NUMBER, RANK, DENSE_RANK usage |
| 3 | DISTINCT ON (auto-rewritten) | “Latest per group” pattern |
| 4 | EXCEPT ALL / INTERSECT ALL | Complete multiset set operations |
| 5 | WITHIN GROUP (ORDER BY) | Ordered-set aggregate support |
| 6 | Native PG parser | Exact PostgreSQL SQL compatibility |
| 7 | PG cost-based optimizer for delta SQL | Mature planner optimizes incremental queries |
| 8 | Full PG type system in IVM queries | PostGIS, ranges, domains, custom operators |
| 9 | Views as sources (auto-inlined) | Transparent view expansion in IVM |
| 10 | Partitioned table support | IVM over partitioned sources |
| 11 | Window functions in recursive queries | Full SQL in WITH RECURSIVE bodies |
| 12 | Auto-rewrite pipeline (6 transparent rewrites) | DISTINCT ON, GROUPING SETS, view inlining, etc. |
Recommendations for pg_trickle
Worth considering (IVM engine improvements)
| Priority | Feature | Description | Effort | Rationale |
|---|---|---|---|---|
| Medium | Recursive efficiency | Semi-naive evaluation for WITH RECURSIVE (avoid full recomputation) | 40+h | Key theoretical gap vs Feldera; high effort but high payoff for recursive workloads |
Not worth pursuing
| Feature | Reason |
|---|---|
| True Z-set weight propagation | Would require rearchitecting the delta model. Binary I/D + auxiliary counters is correct and verified. |
| Persistent operator state | Would duplicate PG’s storage engine. The stateless delta SQL model benefits from PG’s planner and MVCC. |
| Formal proofs (Lean) | High effort, low ROI. Property-based tests + TPC-H provide practical confidence. |
Conclusion
As pure IVM engines operating on PostgreSQL SQL features, Feldera and pg_trickle make fundamentally different trade-offs:
Feldera has stronger theoretical foundations (formal DBSP proofs, true Z-set weights, incremental fixed-point recursion, persistent operator state). Its IVM engine is more efficient for joins over large state and recursive computations. Its weaknesses are restricted window functions (TopK only for ranking), missing set operation variants, and a smaller built-in aggregate library.
pg_trickle has broader SQL coverage (39+ aggregates, full window functions, all set operations, DISTINCT ON, PG type system) and benefits from PostgreSQL’s mature cost-based optimizer for delta queries. Its main weakness is the recomputation-diff approach to recursion.
The two systems are comparable for the most common IVM workloads (joins, filters, algebraic aggregates, subqueries, CTEs). The gap becomes significant for recursive queries, where Feldera’s IVM engine is categorically more capable rather than merely differently optimized.
For pg_trickle, the highest-impact improvement would be semi-naive recursive evaluation to close the recursion efficiency gap — the only area where Feldera has a clear advantage when limited to PostgreSQL SQL features. For all other PostgreSQL SQL constructs, pg_trickle already matches or exceeds Feldera’s coverage.