Contents
v0.9.0 — Algebraic Aggregate Maintenance
Full technical details: v0.9.0.md-full.md
Status: ✅ Released | Scope: Medium (~3–4 weeks)
Fast incremental maintenance of AVG, STDDEV, VARIANCE, and COUNT(DISTINCT) using algebraic shortcuts — no rescanning required for most aggregate updates.
What problem does this solve?
Computing aggregates like averages and standard deviations from a differential update requires knowing both the new rows being added and the old rows being removed. A naive approach would require rescanning large portions of the source table. Algebraic maintenance avoids this by maintaining running totals that can be updated with just the delta rows.
Incremental AVG Maintenance
For a stream table computing the average salary per department, the naive
approach when a salary changes: recompute SUM(salary) / COUNT(*) over all
department rows. The algebraic approach: maintain a running sum and count,
apply +new_salary and -old_salary, divide.
pg_trickle now uses this algebraic approach for AVG in stream tables,
making average computations O(Δ) — proportional to the number of changed
rows — rather than O(n).
Incremental STDDEV and VARIANCE
Standard deviation and variance are more complex than average — they involve squared terms. pg_trickle uses the Welford online algorithm to maintain these algebraically with just the delta rows, using the running sum of squares alongside the running sum.
In plain terms: “average order value with standard deviation by product category” can now be maintained as efficiently as a simple count — the complex math is handled internally.
COUNT(DISTINCT) Fast Path
Counting distinct values (COUNT(DISTINCT customer_id)) is one of the harder
aggregates to maintain incrementally, because you need to know whether a
removed value was the last occurrence. pg_trickle now maintains a reference
count per distinct value, enabling efficient O(Δ) updates without rescanning.
SUM, MIN, MAX Improvements
SUM already used an algebraic approach, but edge cases with negative values
and NULL handling were corrected. MIN and MAX received an optimised
fallback strategy: when the minimum or maximum value is updated, and the
engine cannot determine the new extremum from the delta alone, it triggers
a targeted partial recomputation rather than a full refresh.
Scope
v0.9.0 makes aggregate-heavy stream tables significantly faster by applying algebraic maintenance shortcuts for AVG, STDDEV, VARIANCE, and COUNT(DISTINCT). These improvements are especially impactful for stream tables with many groups (e.g. aggregates by user ID or product ID) where rescanning would be expensive.