Performance Benchmarks

Comparison benchmarks for the weighted_statistics extension, focusing on C vs PL/pgSQL performance and quantile method differences.

Quick Start

# From repository root
./benchmark/run_benchmark.sh

# Custom database
PGDATABASE=mydb PGUSER=myuser ./benchmark/run_benchmark.sh

What Gets Tested

Group 1: C vs PL/pgSQL Comparison

  • Functions tested: weighted_mean, weighted_quantile
  • Implementation comparison: Optimized C code vs optimized PL/pgSQL
  • Array sizes: 1K, 10K, and 100K elements
  • Methodology: 5 iterations per test with statistical averages and standard deviation
  • Purpose: Measure performance advantage of C implementation across scaling

Group 2: Quantile Methods Comparison

  • Methods compared:
    • weighted_quantile - Empirical CDF (baseline)
    • wquantile - Type 7 / Hyndman-Fan (R/NumPy default)
    • whdquantile - Harrell-Davis (smooth estimator)
  • Array sizes: 1K and 10K elements
  • Methodology: 5 iterations per test with statistical averages
  • Purpose: Compare computational cost of different quantile algorithms

Additional Tests

  • Single vs Multiple Quantiles: Efficiency of computing multiple quantiles in one call
  • Sparse Data: All tests use sparse weight arrays (sum ≈ 1.0) to test real-world scenarios

Environment Variables

Uses standard PostgreSQL environment variables (aligned with test suite):

Primary: - PGDATABASE - Database name (default: postgres) - PGUSER - Username (default: postgres) - PGHOST - Host (default: localhost) - PGPORT - Port (default: 5432)

Alternatives: - TEST_DATABASE, TEST_USER, TEST_HOST, TEST_PORT

Prerequisites

  • PostgreSQL with weighted_statistics extension installed and enabled
  • psql client available in PATH
  • Database connection permissions

Output Interpretation

The benchmark shows timing results for each test. Look for:

  • Time values - Execution time for each function
  • C vs PL/pgSQL ratios - Performance improvement of C implementation
  • Quantile method differences - Relative cost of different algorithms
  • Scaling behavior - How performance changes with array size

Manual Execution

# Load PL/pgSQL functions first
psql -f benchmark/plpgsql_functions.sql

# Run performance tests
psql -f benchmark/performance_test.sql

Performance Results

Based on benchmarks run on the target system:

C vs PL/pgSQL Performance

Function Array Size C Time (±stddev) PL/pgSQL Time (±stddev) Speedup
weighted_mean 1K 0.20ms (±0.39) 0.19ms (±0.04) 1.0x (equal)
weighted_mean 10K 0.58ms (±0.94) 2.08ms (±0.51) 3.6x faster
weighted_mean 100K 4.79ms (±0.78) 18.77ms (±1.42) 3.9x faster
weighted_quantile 1K 0.06ms (±0.04) 0.75ms (±0.07) 11.9x faster
weighted_quantile 10K 0.51ms (±0.05) 6.91ms (±0.14) 13.5x faster
weighted_quantile 100K 10.02ms (±0.83) 136.71ms (±22.06) 13.6x faster

Quantile Methods Comparison

Method Array Size Time (±stddev) vs Empirical
weighted_quantile (Empirical CDF) 1K 0.07ms (±0.05) baseline
wquantile (Type 7) 1K 0.10ms (±0.02) 1.5x slower
whdquantile (Harrell-Davis) 1K 1.75ms (±0.04) 25.7x slower
weighted_quantile (Empirical CDF) 10K 0.52ms (±0.07) baseline
wquantile (Type 7) 10K 0.92ms (±0.08) 1.8x slower
whdquantile (Harrell-Davis) 10K 17.33ms (±0.21) 33.4x slower

Key Insights

  • C advantage scales with complexity: Mean functions equal at 1K but 4x faster at 100K; Quantiles consistently 12-14x faster
  • Statistical reliability: Standard deviations show consistent performance across iterations
  • Empirical CDF quantiles are fastest for general use
  • Type 7 quantiles have minimal overhead (1.5-1.8x slower than empirical)
  • Harrell-Davis method is very expensive (25-33x slower than empirical) but provides smoothest estimates
  • Linear scaling confirmed: Performance scales predictably with array size for all methods