SubPlan Detection and Cost Tracking

Overview

The pg_track_optimizer extension now detects and tracks expensive SubPlans (correlated subqueries) by calculating a “worst SubPlan factor” metric. This helps identify queries where correlated subqueries execute many times, causing performance issues.

How It Works

SubPlans in PostgreSQL

SubPlans are correlated subqueries that appear in expression contexts (WHERE clauses, SELECT lists, etc.). Unlike regular plan nodes, SubPlans:

  • Are not direct children in the plan tree
  • Execute once per outer row (stored in pstate->subPlan list)
  • Have their effective cost multiplied by the number of executions

Detection Algorithm

When walking the plan tree in prediction_walker() (plan_error.c:35-77):

  1. After recursively walking child nodes, check if the current node has SubPlans: if (pstate->subPlan != NIL)

  2. For each SubPlan attached to this node:

    • Get instrumentation data: instr = sps->planstate->instrument
    • Get execution loops: nloops = instr->nloops
    • Get planned cost: cost = sps->planstate->plan->total_cost
    • Calculate effective cost: nloops × cost
  3. Track the worst (maximum) SubPlan factor across all SubPlans in the query

Cost Factor Calculation

f_worst_splan = max(nloops × total_cost)

This metric represents: - nloops: How many times the SubPlan executed (once per outer row) - total_cost: PostgreSQL’s estimated cost for the SubPlan - Product: The actual cost burden of the SubPlan on query performance

Database Schema

The new field appears in three places:

1. C Structure (pg_track_optimizer.c:110)

RStats f_worst_splan; /* Worst SubPlan cost factor (nloops × cost) */

2. SQL Function (pg_track_optimizer–0.1.sql:173)

OUT f_worst_splan rstats

3. View Columns (pg_track_optimizer–0.1.sql:234-237)

t.f_worst_splan -> 'min' AS sp_min,
t.f_worst_splan -> 'max' AS sp_max,
t.f_worst_splan -> 'count' AS sp_cnt,
t.f_worst_splan -> 'mean' AS sp_avg,
t.f_worst_splan -> 'stddev' AS sp_dev

Usage Examples

Example 1: Find Queries with Expensive SubPlans

SELECT
  queryid,
  LEFT(query, 50) AS query_preview,
  ROUND((f_worst_splan -> 'mean')::numeric, 2) AS avg_sp_factor,
  ROUND((f_worst_splan -> 'max')::numeric, 2) AS max_sp_factor,
  nexecs
FROM pg_track_optimizer()
WHERE (f_worst_splan -> 'mean')::numeric > 1000
ORDER BY (f_worst_splan -> 'mean')::numeric DESC
LIMIT 10;

This finds queries where SubPlans have an average cost factor over 1000, indicating they execute many times with significant cost.

Example 2: Compare SubPlan Cost to Total Execution Time

SELECT
  queryid,
  ROUND((f_worst_splan -> 'mean')::numeric, 2) AS sp_factor,
  ROUND((exec_time -> 'mean')::numeric, 2) AS avg_time_ms,
  nexecs
FROM pg_track_optimizer()
WHERE (f_worst_splan -> 'mean')::numeric > 0
ORDER BY (f_worst_splan -> 'mean')::numeric DESC;

Queries with high sp_factor relative to exec_time suggest the SubPlan dominates execution time.

Example 3: Detect SubPlan Loops from Test

From sql/subplan.sql regression test:

SELECT
  ROUND((avg_error -> 'mean')::numeric, 2) AS error,
  ROUND((f_join_filter -> 'mean')::numeric, 2) AS jf,
  ROUND((f_scan_filter -> 'mean')::numeric, 2) AS lf,
  ROUND((f_worst_splan -> 'mean')::numeric, 2) AS sp_factor,
  evaluated_nodes,
  plan_nodes,
  nexecs
FROM pg_track_optimizer()
WHERE query LIKE '%FROM outer_table%';

Expected output shows sp_factor of ~1900 for a SubPlan that executed 380 loops.

Interpreting Results

Low Values (< 100)

SubPlans are not a significant performance concern. The correlated subquery either: - Executes few times (low nloops) - Has low individual cost

Medium Values (100 - 1000)

SubPlans contribute measurably to query cost. Consider: - Can the subquery be rewritten as a JOIN? - Is the SubPlan indexed properly? - Could LATERAL join help?

High Values (> 1000)

SubPlans are likely a major performance bottleneck. Optimization strategies:

  1. Rewrite as JOIN: ```sql – Before (SubPlan) SELECT * FROM orders o WHERE total > (SELECT AVG(total) FROM orders WHERE customer_id = o.customer_id);

    – After (JOIN) SELECT o.* FROM orders o JOIN (SELECT customer_id, AVG(total) as avg_total FROM orders GROUP BY customer_id) avg ON o.customer_id = avg.customer_id WHERE o.total > avg.avg_total; ```

  2. Use LATERAL join (PostgreSQL 9.3+): sql SELECT * FROM outer_table o JOIN LATERAL ( SELECT threshold FROM reference_table r WHERE r.category = o.category ) sub ON o.val > sub.threshold;

  3. Add indexes to SubPlan’s inner table scan

  4. Materialize subquery if results are reused: sql WITH thresholds AS MATERIALIZED ( SELECT category, threshold FROM reference_table ) SELECT * FROM outer_table o JOIN thresholds t ON o.category = t.category WHERE o.val > t.threshold;

Implementation Details

Key Code Locations

  • Detection: plan_error.c:35-77 - Iterates through pstate->subPlan list
  • Context: plan_error.h:59 - Field in PlanEstimatorContext
  • Storage: pg_track_optimizer.c:110 - Field in DSMOptimizerTrackerEntry
  • Initialization: plan_error.c:312 - Set to 0.0 before tree walk
  • Accumulation: pg_track_optimizer.c:447-448 - Add value to running stats

Special Cases

  1. InitPlans: SubPlans with setParam != NIL execute once before the main query. These are not currently tracked separately but would show nloops = 1.

  2. Parallel Workers: Current implementation asserts sps->worker_instrument == NULL, meaning SubPlans don’t parallelize (as expected - they’re parallel-restricted).

  3. No SubPlans: Queries without SubPlans have f_worst_splan = 0.0.

Testing

The sql/subplan.sql regression test demonstrates SubPlan detection:

  • Creates tables with correlated subquery in JOIN clause
  • SubPlan executes 380 times (shown in EXPLAIN)
  • f_worst_splan captures the cost burden
  • Test validates the metric appears in output

Run the test: bash make installcheck REGRESS=subplan

Future Enhancements

Potential improvements to SubPlan tracking:

  1. Track SubPlan depth: Nested SubPlans (SubPlans within SubPlans)
  2. Separate InitPlan stats: Different metrics for once-executed InitPlans
  3. Per-SubPlan breakdown: Track multiple SubPlans individually, not just worst
  4. Correlation type: Distinguish EXISTS, ANY, ALL, scalar subqueries
  5. Execution time: Track actual SubPlan execution time, not just cost estimate

References

  • PostgreSQL SubPlan documentation: https://www.postgresql.org/docs/current/using-explain.html
  • Correlated subquery optimization: https://wiki.postgresql.org/wiki/Subquery_scan_removal
  • Parallel query limitations: https://www.postgresql.org/docs/current/parallel-safety.html