v0.9.0 — Algebraic Aggregate Maintenance

Full technical details: v0.9.0.md-full.md

Status: ✅ Released | Scope: Medium (~3–4 weeks)

Fast incremental maintenance of AVG, STDDEV, VARIANCE, and COUNT(DISTINCT) using algebraic shortcuts — no rescanning required for most aggregate updates.


What problem does this solve?

Computing aggregates like averages and standard deviations from a differential update requires knowing both the new rows being added and the old rows being removed. A naive approach would require rescanning large portions of the source table. Algebraic maintenance avoids this by maintaining running totals that can be updated with just the delta rows.


Incremental AVG Maintenance

For a stream table computing the average salary per department, the naive approach when a salary changes: recompute SUM(salary) / COUNT(*) over all department rows. The algebraic approach: maintain a running sum and count, apply +new_salary and -old_salary, divide.

pg_trickle now uses this algebraic approach for AVG in stream tables, making average computations O(Δ) — proportional to the number of changed rows — rather than O(n).


Incremental STDDEV and VARIANCE

Standard deviation and variance are more complex than average — they involve squared terms. pg_trickle uses the Welford online algorithm to maintain these algebraically with just the delta rows, using the running sum of squares alongside the running sum.

In plain terms: “average order value with standard deviation by product category” can now be maintained as efficiently as a simple count — the complex math is handled internally.


COUNT(DISTINCT) Fast Path

Counting distinct values (COUNT(DISTINCT customer_id)) is one of the harder aggregates to maintain incrementally, because you need to know whether a removed value was the last occurrence. pg_trickle now maintains a reference count per distinct value, enabling efficient O(Δ) updates without rescanning.


SUM, MIN, MAX Improvements

SUM already used an algebraic approach, but edge cases with negative values and NULL handling were corrected. MIN and MAX received an optimised fallback strategy: when the minimum or maximum value is updated, and the engine cannot determine the new extremum from the delta alone, it triggers a targeted partial recomputation rather than a full refresh.


Scope

v0.9.0 makes aggregate-heavy stream tables significantly faster by applying algebraic maintenance shortcuts for AVG, STDDEV, VARIANCE, and COUNT(DISTINCT). These improvements are especially impactful for stream tables with many groups (e.g. aggregates by user ID or product ID) where rescanning would be expensive.