trimmed_aggregates 1.0.0

This Release
trimmed_aggregates 1.0.0
Date
Status
Testing
Latest Stable
trimmed_aggregates 1.3.2 —
Latest Testing
trimmed_aggregates 2.0.0-dev —
Other Releases
Abstract
Aggregate for computing trimmed avg/stddev/var (with outliers removed).
Description
An extension written in C that allows you to remove outliers before computing the basic statistical aggregates (AVG/STDDEV/VARIANCE).
Released By
tomasv
License
GPL 3
Resources
Special Files
Tags

Extensions

trimmed_aggregates 1.0.0

README

Contents

Trimmed aggregates
==================
This PostgreSQL extension provides several aggregate functions that
trim the input data set before applying the function, i.e. remove
lowest/highest values. The number of values to be removed is configured
using the parameters.

WARNING: Those aggregates require the whole set, as they need to collect
and sort the whole data set ((to trim low/high values). This may be
a time consuming process and require a lot of memory. Keep this in mind
when using those functions.


Available aggregates
--------------------
The extension implements aggregates that resemble those described here:
http://www.postgresql.org/docs/9.1/static/functions-aggregate.html, i.e.
AVG, VARIANCE, VAR_POP, VAR_SAMP, STDDEV, STDDEV_POP and STDDEV_SAMP

  == AVG ==

  avg_trimmed(value, low_cut, high_cut)

  == VARIANCE ==

  var_trimmed(value, low_cut, high_cut);
  var_pop_trimmed(value, low_cut, high_cut)
  var_samp_trimmed(value, low_cut, high_cut)

  == STDDEV (standard deviation) ==

  stddev_trimmed(value, low_cut, high_cut)
  stddev_pop_trimmed(value, low_cut, high_cut)
  stddev_samp_trimmed(value, low_cut, high_cut)

All those functions are overloaded for double precision, int32 and
int64 data types.

Using the aggregates
--------------------
All the aggregates are used the same way so let's see how to use the
avg_trimmed aggregate. For example this

  SELECT avg_trimmed(i, 0.1, 0.1) FROM generate_series(1,1000) s(i);

means 10% of the values will be removed on both ends, and the average
will be computed using the middle 80%. On the other hand this

  SELECT avg_trimmed(i, 0.2, 0.1) FROM generate_series(1,1000) s(i);

means 20% of the lowest and 10% of the highest values will be removed,
so the average will be computed using the remaining 70% of values.


Installation
------------
Installing this is very simple - if you're on 9.1 you can install
it like any other extension, i.e.

  $ make install
  $ psql dbname -c "CREATE EXTENSION trimmed_averages"

and if you're on an older version, you have to run the SQL script
manually

  $ psql dbname < trimmed_averages--1.0.sql

That's all.