Contents
Compares re2 throughput against PostgreSQL builtin POSIX regex (ARE)

| Category | re2 | builtin |
|---|---|---|
| match | re2match |
regexp_like |
| extract | re2extract |
regexp_substr |
| extract all | re2extractall |
regexp_matches(…, 'g') |
| replace one | re2replaceregexpone |
regexp_replace |
| replace all | re2replaceregexpall |
regexp_replace(…, 'g') |
| count matches | re2countmatches |
regexp_count |
Patterns span literal, character class, alternation, nested quantifier, IP /
email validation, deep alternation, and a ReDoS-shaped (e?){10}e{10} case.
Both RE2 (automaton) and PG (ARE) handle last one without catastrophic
backtracking
Data is 10000 rows of:
email~40 charslogline~200 charslongtext~2000 chars (400 words)
Index scans
re2 also speeds up re2match through two index mechanisms (see Index
Support). These queries compare each against the equivalent PostgreSQL index
scan over a separate 100000-row table.
| Mechanism | re2 | postgres |
|---|---|---|
| b-tree prefix range | re2match(col, '^lit') |
col ~ '^lit' |
| GIN trigram | col @~ pat (gin_re2_ops) |
col ~ pat (gin_trgm_ops) |

| Category | Pattern | rows | re2 | postgres | re2 vs postgres |
|---|---|---|---|---|---|
| btree | ^user5 |
11111 | 1.8 ms | 3.5 ms | 1.9x faster |
| btree | ^user12[0-9] |
1110 | 0.21 ms | 0.43 ms | 2.0x faster |
| gin | error_code=123 |
100 | 3.3 ms | 3.6 ms | 1.1x faster |
| gin | error_code=(100|200|300) |
301 | 3.5 ms | 4.9 ms | 1.4x faster |
The two GIN opclasses extract keys differently. pg_trgm builds trigrams from
alphanumeric words only (never spanning _, =, …) and prunes extracted
trigrams under a fixed penalty budget tuned for natural-language text;
gin_re2_ops keeps every byte trigram of each literal atom RE2’s FilteredRE2
requires. On punctuated machine-text patterns (e.g. error_code=42[0-9] over
loglines where error_code= appears in every row) pruning can leave pg_trgm
with only ubiquitous trigrams, degenerating to a full-index scan while
gin_re2_ops stays selective, an order of magnitude faster. On plain-word
patterns both extract similar keys and pg_trgm’s cheaper consistent check
can win (see error_code=123 above).
Methodology
- JIT and query parallelism disabled to compare single-thread engine throughput reliably
gen_graph.pytakes the median time per (pattern, engine) across all iterations- Index scans use a
text_pattern_opsb-tree and two GIN indexes on one table;enable_seqscanis off there so both engines are measured on their index
Running
Requires re2 (see README) and PostgreSQL 15+ for builtin comparisons.
The index-scan section additionally needs the pg_trgm contrib extension;
setup.sql creates it.
Connection uses libpq environment variables; override the psql binary with
PSQL:
PGDATABASE=mydb ./run_bench.sh # 5 iterations (default)
PGDATABASE=mydb ./run_bench.sh 10 # 10 iterations
./gen_graph.py # regenerate graph.png & graph_index.png