ParadeDB Benchmarks

Benchmarking suite for ParadeDB. Executes a series of common full text and faceted queries over a generated table, with text, numeric, timestamp and JSON columns.

Prerequisites

The benchmarking scripts require a Postgres database with pg_search installed. If you are building pg_search with cargo pgrx, make sure to build in --release mode.

Usage

The following command generates a test table, builds a BM25 index, runs benchmarking queries, and outputs the results to a Markdown file.

cargo run -- --url POSTGRES_URL

For more options:

cargo run -- --help

Datasets

Each benchmark run uses a single dataset located under datasets/$name, with data generated by a datasets/$name/generate.sql file.

The queries that are benchmarked for a dataset are located at datasets/$name/queries/$type/*.sql (where $type is usually “pg_search”). Each query file represents a single query: when a single file contains multiple queries, the first query in the file is considered to be the canonical/idiomatic way to write the query, and any additional queries in the file are considered alternative ways to write the query. The canonical query may not always be the fastest (yet!) but we strive to make the canonical query perform as well as a non-idiomatic, slightly contorted query might.