Wiki JOB Report Structure Documentation

This document describes the standardized structure for Join Order Benchmark (JOB) test reports published to the pg_track_optimizer wiki. The jo-bench.yml workflow must generate reports consistent with this document.

Report File Naming Convention

Pattern: report-YYYY-MM-DD-HHMMSS.md

Example: report-2026-01-05-084656.md

The timestamp corresponds to the test execution date/time in UTC format.

Artifact Directory Structure

All test artifacts and generated scripts are organized in subdirectories within the wiki repository:

wiki/
└── job-pass/
    └── YYYY-MM-DD-HHMMSS/
        ├── pg_track_optimizer_jobench_results_pass1.zip
        ├── pg_track_optimizer_jobench_results_pass2.zip
        ├── postgresql_log.zip
        └── schema.sql

Example: wiki/job-pass/2026-01-05-084656/

Artifact Files

  1. pg_track_optimizer_jobench_results_pass1.zip

    • Dump the extension’s statistical data to analyse later. It is copied to wiki’s repository right after the JOB pass has finalised.
    • Contains: CSV file with Pass 1 results (without extra indexes)
    • Generated from: /tmp/pg_track_optimizer_results_pass1.csv
  2. pg_track_optimizer_jobench_results_pass2.zip

    • Dump the extension’s statistical data to analyse later. It is copied to wiki’s repository right after the JOB pass has finalised.
    • Contains: CSV file with Pass 2 results (with extra indexes)
    • Generated from: /tmp/pg_track_optimizer_results_pass2.csv
  3. postgresql_log.zip

    • Save EXPLAINs logged by the extension to match execution results. It is copied to wiki’s repository right after the JOB pass has finalised.
    • Contains: PostgreSQL server log with EXPLAIN ANALYZE dumps
    • Source: ~/pgdata/logfile.log
    • Each query execution includes queryId for matching to tracker records (this is provided by the extension’s print format itself, no extra actions needed in workflow)
  4. schema.sql

    • Auto-generated CREATE TABLE statement
    • An example of COPY command restoring data from a state dump.
    • Schema matches current pg_track_optimizer view definition
    • Used for importing CSV artifacts into analysis database

Report Markdown Structure

1. Title

# Join Order Benchmark Test Results

Note: Use “Join Order Benchmark” (full name), not “JO-Bench”

2. Test Environment Section

## Test Environment

- **Test Date**: YYYY-MM-DD HH:MM:SS UTC
- **PostgreSQL Version**: [full version string from SELECT version()]
- **PostgreSQL Commit**: [`<short-hash>`](https://github.com/postgres/postgres/commit/<full-hash>)
- **Extension Commit**: [`<short-hash>`](https://github.com/danolivo/pg_track_optimizer/commit/<full-hash>)
- **Test Configuration**: Two-pass testing (Pass 1: no extra indexes, Pass 2: with [extra indexes](extra_indexing))
- **Benchmark Source**: [<short-hash>](https://github.com/danolivo/jo-bench/commit/<full-hash>)

Fields: - Test Date: UTC timestamp of test execution - PostgreSQL Version: Full output from SELECT version() - PostgreSQL Commit: Link to postgres/postgres commit (7-char short hash shown, full hash in URL) - Extension Commit: Link to pg_track_optimizer commit - Test Configuration: Standard two-pass description with link to extra_indexing wiki page - Benchmark Source: Link to jo-bench repository commit used for test data/queries

3. Results Section - Pass 1

## Top Queries by Error Metrics (Pass 1 - basic Join-Order-Benchmark)

This table shows queries that appear in the top 10 for **all** error metrics (avg_avg, rms_avg, twa_avg, and wca_avg). These are the queries with consistently poor estimation across all criteria.

**Note**: These results are from **Pass 1** (without extra indexes).

Example: queryid | query | error | nj_min | nj_max ----------------------+---------------------+--------+--------+-------- 8437282107574130241 | /* 30c.sql */ | 4.35 | 3 | 3 -8317260399803614106 | /* 28c.sql */ | 4.07 | 4 | 4 -1799126654562256384 | /* 29c.sql */ | 3.74 | 2 | 5 3901602676749870732 | /* 7c.sql */ | 3.63 | 3 | 3 (4 rows)

4. Results Section - Pass 2

## Top Queries by Error Metrics (Pass 2 - with extra indexes)

This table shows queries that appear in the top 10 for **all** error metrics (avg_avg, rms_avg, twa_avg, and wca_avg). These are the queries with consistently poor estimation across all criteria.

**Note**: These results are from **Pass 2** (with extra indexes). Pass 1 results (without extra indexes) are available as a separate artifact for comparison.

Example: queryid | query | error | nj_min | nj_max ----------------------+---------------------+--------+--------+-------- 8437282107574130241 | /* 30c.sql */ | 4.35 | 3 | 3 -8317260399803614106 | /* 28c.sql */ | 4.07 | 4 | 4 -1799126654562256384 | /* 29c.sql */ | 3.74 | 2 | 5 3901602676749870732 | /* 7c.sql */ | 3.63 | 3 | 3 (4 rows)

Table Format: - 5 columns: queryid, query, error, nj_min, nj_max - Query column: Shows 32 first symbols of the SQL (e.g., /* 30c.sql */). - Error column: Single aggregated error metric (represents a mean field of simple average RStats error column) - nj_min/nj_max columns: Minimum and maximum number of JOIN nodes across executions. If these differ, it indicates plan variability for the same queryId. - Use PostgreSQL psql output format (not markdown table) - Include row count footer: (N rows)

SQL Query to generate this table (approximate): sql WITH top_avg AS (SELECT queryid FROM pg_track_optimizer ORDER BY avg_avg DESC LIMIT 10), top_rms AS (SELECT queryid FROM pg_track_optimizer ORDER BY rms_avg DESC LIMIT 10), top_twa AS (SELECT queryid FROM pg_track_optimizer ORDER BY twa_avg DESC LIMIT 10), top_wca AS (SELECT queryid FROM pg_track_optimizer ORDER BY wca_avg DESC LIMIT 10), intersection AS ( SELECT queryid FROM top_avg INTERSECT SELECT queryid FROM top_rms INTERSECT SELECT queryid FROM top_twa INTERSECT SELECT queryid FROM top_wca ) SELECT v.queryid, LEFT(v.query, 32), ROUND(v.avg_avg::numeric, 2) as error, v.nj_min::int, v.nj_max::int FROM pg_track_optimizer v WHERE v.queryid IN (SELECT queryid FROM intersection) ORDER BY error DESC;

Note: Function ROUND needs first argument to be numeric. So, always cast this argument.

5. Column Description Section

## Column Description

The results above include the following metrics for each query:
- **queryid**: Internal PostgreSQL query identifier
- **query**: The SQL query text (normalised, with literals replaced by placeholders; truncated to first 32 characters)
- **error**: Simple average error across plan nodes
- **nj_min**: Minimum number of JOIN nodes across executions
- **nj_max**: Maximum number of JOIN nodes across executions (differs from nj_min indicates plan variability)

Only queries appearing in the top 10 of **every** error metric are shown, representing the most consistently problematic queries.

Note: Simplified from the full pg_track_optimizer view schema. Only documents the 5 columns shown in the results table.

6. Workflow Artifacts Section

## How to Use Workflow Artifacts

The workflow performs **two passes** and produces two CSV artifacts for comparison:

- **Pass 1** (`pg_track_optimizer_jobench_results_pass1`): [results](job-pass/YYYY-MM-DD-HHMMSS/pg_track_optimizer_jobench_results_pass1.zip) **without** extra indexes.
- **Pass 2** (`pg_track_optimizer_jobench_results_pass2`): [results](job-pass/YYYY-MM-DD-HHMMSS/pg_track_optimizer_jobench_results_pass2.zip) **with** extra indexes from [Extra-Indexing](extra_indexing) wiki page.

Also, explore the [log file](job-pass/YYYY-MM-DD-HHMMSS/postgresql_log.zip), which contains EXPLAIN ANALYZE dumps for each executed query. You can match them to a 'tracker' record by queryId, attached to each explain.

Artifacts can also be downloaded from the workflow run and imported into your own PostgreSQL database for comparative analysis.

### 1. Create the tracking table

The schema is auto-generated to match the current `pg_track_optimizer` view definition. Download the schema file: [schema](job-pass/YYYY-MM-DD-HHMMSS/schema.sql)

### 2. Import the CSV artifact

\copy job_tracking_data FROM ‘pg_track_optimizer_results.csv’ CSV HEADER; ```

3. Example analysis queries

Find queries with the highest average error: sql SELECT queryid, LEFT(query, 80) as query_preview, avg_avg, rms_avg, twa_avg, wca_avg FROM job_tracking_data ORDER BY avg_avg DESC LIMIT 10;

Find queries appearing in the top 10 of all error metrics (most consistently problematic): sql WITH top_avg AS (SELECT queryid FROM job_tracking_data ORDER BY avg_avg DESC LIMIT 10), top_rms AS (SELECT queryid FROM job_tracking_data ORDER BY rms_avg DESC LIMIT 10), top_twa AS (SELECT queryid FROM job_tracking_data ORDER BY twa_avg DESC LIMIT 10), top_wca AS (SELECT queryid FROM job_tracking_data ORDER BY wca_avg DESC LIMIT 10), intersection AS ( SELECT queryid FROM top_avg INTERSECT SELECT queryid FROM top_rms INTERSECT SELECT queryid FROM top_twa INTERSECT SELECT queryid FROM top_wca ) SELECT t.* FROM job_tracking_data t WHERE t.queryid IN (SELECT queryid FROM intersection) ORDER BY t.avg_avg DESC; ```

Artifact Links Format: - Use relative paths from wiki root - Pattern: job-pass/YYYY-MM-DD-HHMMSS/filename.ext - All links must be enclosed in markdown link format: [text](path)

7. Footer

---

_Automatically generated by [JO-Bench workflow](https://github.com/danolivo/pg_track_optimizer/actions/workflows/jo-bench.yml)_

Workflow Implementation Requirements

The jo-bench.yml workflow and generate-wiki-page.sh script must:

A. Generate Timestamp Variable

TIMESTAMP=$(date -u '+%Y-%m-%d-%H%M%S')  # e.g., 2026-01-05-084656

B. Create Artifact Directory

ARTIFACT_DIR="${WIKI_DIR}/job-pass/${TIMESTAMP}"
mkdir -p "${ARTIFACT_DIR}"

C. Copy and Compress Artifacts

# Copy CSV results
zip "${ARTIFACT_DIR}/pg_track_optimizer_jobench_results_pass1.zip" \
    /tmp/pg_track_optimizer_results_pass1.csv

zip "${ARTIFACT_DIR}/pg_track_optimizer_jobench_results_pass2.zip" \
    /tmp/pg_track_optimizer_results_pass2.csv

# Copy PostgreSQL log
zip "${ARTIFACT_DIR}/postgresql_log.zip" ~/pgdata/logfile

# Copy schema SQL
cp "${SCHEMA_FILE}" "${ARTIFACT_DIR}/schema.sql"

All artifact links in the markdown must use the format: markdown [text](job-pass/YYYY-MM-DD-HHMMSS/filename.ext)

E. Extract Benchmark Commit

# Get jo-bench repository commit
cd ~/jo-bench
BENCH_COMMIT=$(git rev-parse HEAD)

Include in report metadata: markdown - **Benchmark Source**: [`${BENCH_COMMIT:0:7}`](https://github.com/danolivo/jo-bench/commit/${BENCH_COMMIT})

F. Simplified Results Query

The workflow should execute a simplified query that produces 3 columns only: - queryid - query (truncated/formatted with query filename comment) - error (simple average error)

Format as psql output (not markdown table) and wrap in triple backticks.

G. Wiki Page Naming

Create page with name pattern: bash WIKI_PAGE="report-${TIMESTAMP}.md"

Add link to Home.md in the JO-Bench results section: bash sed -i "/## JO-Bench Performance Test Results/a\\- [${TEST_DATE}](report-${TIMESTAMP})" Home.md

I. Git Operations

cd "${WIKI_DIR}"
git add "report-${TIMESTAMP}.md" \
        "job-pass/${TIMESTAMP}/" \
        "Home.md"
git commit -m "Add JO-Bench results for ${TEST_DATE}"
git push origin master

Validation Checklist

Before publishing a report, verify:

  • [ ] Report filename matches report-YYYY-MM-DD-HHMMSS.md pattern
  • [ ] Artifact directory exists: job-pass/YYYY-MM-DD-HHMMSS/
  • [ ] All 4 artifact files present and properly compressed/formatted
  • [ ] Report title is “Join Order Benchmark Test Results” (not “JO-Bench”)
  • [ ] Test Environment has all 6 metadata fields
  • [ ] PostgreSQL Version shows full version string
  • [ ] All commit links use 7-char short hash display, full hash in URL
  • [ ] Both Pass 1 and Pass 2 result sections present
  • [ ] Results table uses PostgreSQL psql format (not markdown table)
  • [ ] Results table has exactly 5 columns (queryid, query, error, nj_min, nj_max)
  • [ ] Column Description documents only the 5 shown columns
  • [ ] Artifact links use relative paths: job-pass/YYYY-MM-DD-HHMMSS/...
  • [ ] All artifact links enclosed in markdown format: [text](path)
  • [ ] Footer contains workflow attribution link
  • [ ] Home.md updated with link to new report
  • [ ] Benchmark Source commit included in metadata

Future Improvements

  • [ ] Implement Pass 1 results generation (currently shows “TODO”)
  • [ ] Add comparative analysis between Pass 1 and Pass 2
  • [ ] Consider adding summary statistics section
  • [ ] Explore visualization options for error metrics comparison

Extra Requirements for Claude

  • Follow this document as strictly as possible.
  • Always ask questions if you’re not sure about your decision.
  • Keep report page as laconic as possible. In case of wordy material prefer to fold it into an attachement.
  • Check workflows on consistency with this document.
  • Re-check consistency of this document before using it.
  • Never change this document automatically.