Performance Benchmarks

Performance Benchmarks

Comparison benchmarks for the weighted_statistics extension, focusing on C vs PL/pgSQL performance and quantile method differences.

Quick Start

# From repository root
./benchmark/run_benchmark.sh

# Custom database
PGDATABASE=mydb PGUSER=myuser ./benchmark/run_benchmark.sh

What Gets Tested

Group 1: C vs PL/pgSQL Comparison

Functions tested: weighted_mean, weighted_quantile
Implementation comparison: Optimized C code vs optimized PL/pgSQL
Array sizes: 1K, 10K, and 100K elements
Methodology: 5 iterations per test with statistical averages and standard deviation
Purpose: Measure performance advantage of C implementation across scaling

Group 2: Quantile Methods Comparison

Methods compared:
- weighted_quantile - Empirical CDF (baseline)
- wquantile - Type 7 / Hyndman-Fan (R/NumPy default)
- whdquantile - Harrell-Davis (smooth estimator)
Array sizes: 1K and 10K elements
Methodology: 5 iterations per test with statistical averages
Purpose: Compare computational cost of different quantile algorithms

Additional Tests

Single vs Multiple Quantiles: Efficiency of computing multiple quantiles in one call
Sparse Data: All tests use sparse weight arrays (sum ≈ 1.0) to test real-world scenarios

Environment Variables

Uses standard PostgreSQL environment variables (aligned with test suite):

Primary: - PGDATABASE - Database name (default: postgres) - PGUSER - Username (default: postgres) - PGHOST - Host (default: localhost) - PGPORT - Port (default: 5432)

Alternatives: - TEST_DATABASE, TEST_USER, TEST_HOST, TEST_PORT

Prerequisites

PostgreSQL with weighted_statistics extension installed and enabled
psql client available in PATH
Database connection permissions

Output Interpretation

The benchmark shows timing results for each test. Look for:

Time values - Execution time for each function
C vs PL/pgSQL ratios - Performance improvement of C implementation
Quantile method differences - Relative cost of different algorithms
Scaling behavior - How performance changes with array size

Manual Execution

# Load PL/pgSQL functions first
psql -f benchmark/plpgsql_functions.sql

# Run performance tests
psql -f benchmark/performance_test.sql

Performance Results

Based on benchmarks run on the target system:

C vs PL/pgSQL Performance

Function	Array Size	C Time (±stddev)	PL/pgSQL Time (±stddev)	Speedup
weighted_mean	1K	0.20ms (±0.39)	0.19ms (±0.04)	1.0x (equal)
weighted_mean	10K	0.58ms (±0.94)	2.08ms (±0.51)	3.6x faster
weighted_mean	100K	4.79ms (±0.78)	18.77ms (±1.42)	3.9x faster
weighted_quantile	1K	0.06ms (±0.04)	0.75ms (±0.07)	11.9x faster
weighted_quantile	10K	0.51ms (±0.05)	6.91ms (±0.14)	13.5x faster
weighted_quantile	100K	10.02ms (±0.83)	136.71ms (±22.06)	13.6x faster

Quantile Methods Comparison

Method	Array Size	Time (±stddev)	vs Empirical
weighted_quantile (Empirical CDF)	1K	0.07ms (±0.05)	baseline
wquantile (Type 7)	1K	0.10ms (±0.02)	1.5x slower
whdquantile (Harrell-Davis)	1K	1.75ms (±0.04)	25.7x slower
weighted_quantile (Empirical CDF)	10K	0.52ms (±0.07)	baseline
wquantile (Type 7)	10K	0.92ms (±0.08)	1.8x slower
whdquantile (Harrell-Davis)	10K	17.33ms (±0.21)	33.4x slower

Key Insights

C advantage scales with complexity: Mean functions equal at 1K but 4x faster at 100K; Quantiles consistently 12-14x faster
Statistical reliability: Standard deviations show consistent performance across iterations
Empirical CDF quantiles are fastest for general use
Type 7 quantiles have minimal overhead (1.5-1.8x slower than empirical)
Harrell-Davis method is very expensive (25-33x slower than empirical) but provides smoothest estimates
Linear scaling confirmed: Performance scales predictably with array size for all methods

PGXN

PostgreSQL Extension Network

Contents