Extensions
- pg_table_range 0.1.0
- Planning-time partition pruning by per-partition data ranges, including non-key columns
README
Contents
- Experimental - very AI driven at this point in time.
- pg_table_range: PostgreSQL data-range partition pruning
Experimental - very AI driven at this point in time.
pg_table_range: PostgreSQL data-range partition pruning
A PostgreSQL 16+ extension that prunes partitions at planning time from a compact per-partition summary of each column’s actual data — its min/max range for scalar columns, or its covering extent for range types and PostGIS geometry. This works on columns that are not the partition key, which native PostgreSQL partition pruning cannot eliminate. Pruning is conservative: a partition is removed only when its summary provably cannot contain a matching row, so results are always identical to running without it.
⚠️ Many partitions? Raise
max_locks_per_transactionfirstPruning (and the index build) on a non-key column requires PostgreSQL to lock every partition in one transaction. On the default
max_locks_per_transaction = 64this exhausts the lock table at roughly a few thousand partitions, with:ERROR: out of shared memory HINT: You might need to increase "max_locks_per_transaction".If you have thousands of partitions, raise
max_locks_per_transaction(it requires a restart) before creating the index or querying — see Scaling and partition count for sizing. This is a PostgreSQL limit on wide non-key access, not specific to this extension.
Installation
This is a pgrx (Rust) extension. Supported:
PostgreSQL 16, 17, 18. Pick whichever route fits; after installing, run
CREATE EXTENSION pg_table_range; in each database that should use it.
Docker (PostgreSQL 18 + PostGIS, batteries included)
docker run -e POSTGRES_PASSWORD=secret ghcr.io/bitner/pg_table_range:18
The image extends the official postgres:18 with PostGIS and pg_table_range, and enables
both extensions in the default database on first start. Tags: :18, :vX.Y.Z, :latest.
Debian / Ubuntu package (no Rust toolchain needed)
Prebuilt .debs for PostgreSQL 16/17/18 (Debian bookworm, for the
apt.postgresql.org / PGDG packages) are attached to
each GitHub release:
sudo apt install ./postgresql-18-pg-table-range_<version>-1_amd64.deb
From source with cargo-pgrx
Requires a Rust toolchain and cargo-pgrx matching the pinned pgrx version (0.18.1).
The bundled Makefile wraps cargo-pgrx (it writes a minimal pgrx config.toml for you, so
no cargo pgrx init/initdb is needed):
cargo install cargo-pgrx --version 0.18.1 --locked
make install PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config # major auto-detected from pg_config
Or drive cargo-pgrx directly — register your PostgreSQL once, then install (use
--no-default-features --features pg16|pg17 for those majors):
cargo pgrx init --pg18 /usr/lib/postgresql/18/bin/pg_config
cargo pgrx install --release
The distribution is also on PGXN: pgxn download
pg_table_range, unzip, then make install (or the cargo-pgrx commands above). pgxn
install runs the bundled Makefile, so it works too — but still needs cargo-pgrx present,
since this is a Rust extension rather than a C/PGXS one.
Quick Start
Summaries are built and maintained through a custom index access method, so pruning
follows the normal index lifecycle (pg_dump/restore, REINDEX, DROP INDEX).
CREATE EXTENSION pg_table_range;
-- Summarize one or more columns of a partitioned (or plain) table.
CREATE INDEX events_tr ON events USING table_range (val, created_at);
-- Queries now prune partitions whose summary cannot match the predicate.
-- Verify with EXPLAIN: non-matching partitions disappear from the plan.
EXPLAIN (COSTS OFF) SELECT * FROM events WHERE val >= 250;
-- Inserts maintain the summary automatically; REINDEX only re-tightens after many
-- deletes. DROP INDEX removes the summary with the index.
REINDEX INDEX events_tr;
DROP INDEX events_tr;
The index is never used for scans — it exists only to build and own the summaries — so it adds no scan-time overhead and is never chosen by the planner for data access.
Supported column types (no setup, including PostGIS)
CREATE INDEX … USING table_range works on any btree-comparable type and any
range type out of the box. The required operator classes are provisioned
automatically by mirroring the types that already have a btree/range operator class — and
that mirror re-runs whenever an extension is installed, so PostGIS geometry works the
moment you CREATE EXTENSION postgis, with no extra step:
CREATE EXTENSION postgis; -- geometry opclass auto-registers
CREATE INDEX places_tr ON places USING table_range (geom);
EXPLAIN (COSTS OFF) SELECT * FROM places WHERE geom && ST_MakeEnvelope(0,0,10,10);
How it works
- Summaries live in the index. Like BRIN, each leaf partition’s summary is stored in
that partition’s index — one record per indexed column on the index’s metapage, not
in any side table. It holds the
has_nulls/all_nullsflags plus either the column’s btreemin/max(scalar columns) or a single covering extent — a covering range for range types (range_merge(range_agg(col))) or the bounding box for PostGIS geometry (ST_Extent(col)). - Planning. For each partition the planner builds, a
set_rel_pathlist_hookreads the summary from that partition’s index and evaluates the partition’s restriction clauses against it, callingmark_dummy_relon any partition that provably cannot match — eliminating it before child paths are generated. Deserialized summaries are cached for the life of the backend (kept coherent by a relcache-invalidation callback), so warm plans skip the per-partition page read; see Performance. - Typed comparisons. Min/max vs. constant comparisons use each column type’s own
btree compare function, so any btree-comparable type works:
bigint/int/smallint,numeric,real/double precision,text/varchar,date,time,timestamp,timestamptz,uuid,boolean,oid, etc. Any conversion problem degrades safely to “keep”. - Overlap (
&&). For range types and PostGIS geometry, an&&(overlaps) predicate is pruned by testing the constant against the partition’s stored extent with PostgreSQL’s own&&operator — so a partition is eliminated when its extent cannot overlap the query. - Incremental maintenance (no REINDEX).
aminsertwidens the summary in place as rows are inserted — the same way BRIN maintains its ranges. Because the summary only ever needs to be over-inclusive, these updates need no MVCC: an insert within the existing range writes nothing; one that extends it grows the min/max/extent. Pruning therefore stays correct and active across inserts without any rebuild. Deletes only shrink a partition’s true range, leaving the summary conservatively wide (still safe);VACUUM/REINDEXcan re-tighten it for selectivity.DROP INDEXremoves the summary with the index’s storage — there is no side table to clean up.
Performance
table_range trades a small amount of planning time for a large execution win. A selective predicate on a non-key column scans only the matching partition instead of every partition. The planner pays a little to evaluate each partition’s summary, but that cost is small and — warm — close to free (see the cache note below).
The numbers below are reproducible with bench/benchmark.sql (cargo pgrx run pg18, then
\i bench/benchmark.sql); they report EXPLAIN (ANALYZE) planning and execution time
separately, warm, on PostgreSQL 18.
Faster execution. 300 partitions × 8,000 rows (2.4M rows), WHERE nk = <value in one
partition>:
| Planning | Execution | Total | |
|---|---|---|---|
| pruning off (scans all 300 partitions) | ~4 ms | ~110 ms | ~114 ms |
| pruning on (scans 1 partition) | ~4 ms | ~0.4 ms | ~4 ms |
Execution is ~250× faster, total time drops ~25×, and warm the planning overhead is in the
noise. The win grows with how much data the eliminated partitions hold; measure your
workload with table_range.enable_pruning.
Honest comparison to native pruning. When a predicate is on the partition key,
PostgreSQL prunes natively — and that path is in a different league, because it eliminates
partitions from a sorted bound array before they are ever locked or opened. The table
below uses two identical columns on the same table: pk (the range partition key, pruned
natively) and nk (the same values, not the key, pruned by table_range):
Same = predicate, 2,000 partitions |
Planning | Execution |
|---|---|---|
| native pruning — column is the partition key | ~0.15 ms | ~0.05 ms |
| table_range — column is not the partition key | ~34 ms | ~0.06 ms |
| no pruning — scans all 2,000 partitions | ~28 ms | ~27 ms |
Native pruning is hundreds of times cheaper to plan and is effectively constant in the partition count. table_range cannot match that (see Scaling): its job is the case native pruning can’t do — eliminating partitions by a non-key column. Note that table_range’s overhead over the no-pruning baseline (~28 ms to expand 2,000 partitions) is now small (~6 ms, ~3 µs/part).
Comparison to CHECK constraint exclusion. The built-in way to prune on a non-key
column is to put a data-range CHECK (col BETWEEN lo AND hi) on each partition and let the
planner’s constraint exclusion refute it. That is the most direct apples-to-apples
baseline. Same table, 2,000 partitions, same nk = <value> predicate:
Same = predicate, 2,000 partitions |
Planning | Execution | Scans |
|---|---|---|---|
CHECK constraint exclusion (constraint_exclusion=on) |
~37 ms | ~0.08 ms | 1 partition |
| table_range pruning | ~34 ms | ~0.08 ms | 1 partition |
| no pruning | ~26 ms | ~25 ms | all 2,000 |
Both are O(partitions) and give the identical execution win, and table_range now
plans on par with — and warm, slightly faster than — constraint exclusion. (Constraint
exclusion re-parses each partition’s CHECK expression on every plan; table_range serves
warm plans from a cached summary, see below.) On top of matching the speed, table_range
avoids everything CHECK constraints make you give up:
- No manual management —
CREATE INDEXbuilds and owns the ranges; you don’t compute and attach a constraint per partition and keep it correct. - No enforcement / no blocked inserts — a real
CHECKrejects out-of-range rows; table_range’s summary simply widens to cover new data, so inserts never fail. - Incremental maintenance — changing a
CHECKmeansDROP/ADD CONSTRAINTwith a full-partition revalidation scan; table_range widens in place inaminsert, no rescan.
How the per-partition cost got small. Two optimizations took the per-partition planning cost from ~31 µs to ~3–4 µs:
- Per-plan compilation. The compare function, type-input function, and operator strategy
are identical across a column’s partitions, so they are resolved once per plan
(cached
FmgrInfos) instead of re-looked-up for each partition. - Backend summary cache. Each index’s deserialized summary is cached for the life of the
backend, so warm/repeated plans skip the per-partition index open and metapage
read+deserialize entirely. The cache is kept coherent by a relcache invalidation
callback:
aminsertonly ever widens a summary, and when it does it invalidates the cached copy everywhere — so a cached summary is never narrower than reality (a wider one prunes correctly). A cold first plan still reads each page; every plan after is cached.
Scaling and partition count
table_range’s planning cost is O(number of partitions): PostgreSQL builds a planner node for every partition of the table for a non-key predicate, and table_range evaluates each one’s summary. This is fundamentally different from native partition pruning, which is ~O(log n) because it prunes on the partition key before expansion. There is no public planner hook that prunes a non-key column before expansion, so this O(n) cost is inherent.
Two practical consequences and how to handle them:
- Lock table exhaustion (the hard wall). Two operations lock every partition (and its
indexes) in a single transaction:
CREATE INDEX … USING table_range, which builds one child index per partition — so on too many partitions the index can’t even be built (it fails and rolls back, leaving no summary, so queries then scan everything);- any query on a non-key column, whose planning expands and locks all partitions.
On the default
max_locks_per_transaction = 64the lock table holds only ~6,400 locks (with defaultmax_connections = 100), so at roughly a few thousand partitions — where2 × partitionsexceeds that — you get:
ERROR: out of shared memory
HINT: You might need to increase "max_locks_per_transaction".
This is a PostgreSQL limit on wide non-key access, not specific to this extension — native key pruning avoids it by never locking pruned partitions.
How to fix it. Raise max_locks_per_transaction. It is a postmaster-level setting, so
it requires a restart:
ALTER SYSTEM SET max_locks_per_transaction = 4096; -- then restart PostgreSQL
Sizing: the lock table holds about max_locks_per_transaction × (max_connections +
max_prepared_transactions) locks, and one statement over N partitions needs roughly
2 × N of them (a heap + an index lock per partition). Pick a value so that product
comfortably exceeds 2 × N for your largest partitioned table, with headroom for
concurrency. With default max_connections, a few thousand (e.g. 4096) covers tens of
thousands of partitions; each lock slot costs only a few hundred bytes of shared memory.
- Planning time grows with partition count. Even below the lock wall, planning scales
linearly — though the per-partition constant is now small (~3–4 µs warm, on par with
CHECK constraint exclusion) thanks to the per-plan compilation and backend summary
cache described above. Mitigations: prefer fewer, larger partitions (table_range’s
sweet spot — the execution win is biggest there anyway); use prepared statements so a
plan is reused across executions; and where you can, align the hot filter column with
the partition key so native pruning handles it.
In short, table_range targets hundreds to a few thousand sizeable partitions with a selective non-key predicate. For tens of thousands of partitions, non-key pruning is not something an extension can make sub-linear today; that would require pre-expansion pruning support in PostgreSQL core.
Supported predicates
Everything not listed is conservatively kept (never mispruned):
- Comparisons
col < c,<=,=,>=,>(either operand order), andBETWEEN(the planner expands it into two comparisons). col IS NULL/col IS NOT NULL.col IN (c1, c2, …)/col = ANY(<const array>)— pruned when no listed value falls in the partition’s range.col && const(overlaps) for range types and PostGISgeometry— pruned when the partition’s extent cannot overlap the constant.- Boolean structure composes:
ANDprunes if any child proves non-overlap,ORprunes only if every branch does (nested arbitrarily, across any columns). - Kept (correct, not yet pruned):
NOT IN/<> ALL,NOT (...), function-wrapped columns, and parameters in prepared statements until the plan inlines constants.
Configuration
table_range.enable_pruning(defaulton) — master switch.table_range.log_pruning_debug(defaultoff) — log each prune decision.
Storage
There is no catalog table — each partition’s summary lives on its table_range index’s
metapage (block 0), written by ambuild and updated in place by aminsert, like BRIN.
Project layout
| File | Responsibility |
|---|---|
src/lib.rs |
GUCs, _PG_init, test wiring |
src/index_storage.rs |
per-index summary on the metapage: page I/O (Generic WAL) + (de)serialization |
src/summary_build.rs |
build a leaf’s summary by scanning its data (used by ambuild) |
src/prune_hook.rs |
planner + pathlist hooks, per-plan compilation cache, typed in-memory evaluation |
src/summary_cache.rs |
backend-lifetime per-index summary cache + relcache-invalidation coherence |
src/index_am.rs |
table_range index AM: build, incremental aminsert widening, opclass provisioning |
src/e2e_tests.rs, src/index_am_tests.rs |
end-to-end tests |
Building and testing
cargo pgrx test pg18 # run the end-to-end test suite (PostgreSQL 18)
cargo pgrx run pg18 # open psql with the extension installed
Supported targets: PostgreSQL 16, 17, 18. The test suite is entirely end-to-end —
it builds real partitioned tables, asserts EXPLAIN shows the expected partition
elimination, and verifies results are identical with pruning on and off (the
no-false-negative guarantee), including insert/delete/drop correctness paths. The
PostGIS geometry test skips automatically where PostGIS is not installed; CI installs
PostGIS so it runs there, and overlap pruning is also covered on every target by the
range-type tests, which exercise the same code path.
Releasing (maintainers)
.github/workflows/release.yml runs on any v* tag and produces everything:
| Job | Output |
|---|---|
debian |
.deb packages for PostgreSQL 16/17/18 (Debian bookworm / PGDG), built in a debian:bookworm container with packaging/build-deb.sh |
docker |
ghcr.io/<owner>/pg_table_range:18, :vX.Y.Z, :latest (PostgreSQL 18 + PostGIS + the extension) |
github-release |
attaches the .debs to the GitHub Release |
pgxn |
validates META.json, bundles the source, and uploads it to PGXN |
workflow_dispatch runs debian + docker as a dry run (no pushes), to validate
packaging without cutting a release.
One-time setup: add two repository secrets, PGXN_USERNAME and PGXN_PASSWORD, from a
PGXN Manager account. (The Docker push uses the built-in
GITHUB_TOKEN; ensure the repo allows GitHub Actions to publish packages.)
To cut a release:
- Bump the version in
Cargo.tomlandMETA.json(keep them identical; the.controlfile’sdefault_versionis filled fromCargo.tomlat build time). - Commit, then tag and push — the tag without its leading
vmust equal theMETA.jsonversion:
git tag v0.1.0 && git push origin v0.1.0
META.json describes the PGXN distribution; PLAN.md and .github/ are excluded from
the published tarball via .gitattributes.
Limitations
- Lock-table wall on many partitions. Both
CREATE INDEX … USING table_rangeand queries on a non-key column lock every partition at once, so on the defaultmax_locks_per_transaction = 64they fail (out of shared memory) at roughly a few thousand partitions. Raisemax_locks_per_transaction(needs a restart) — see Scaling for sizing. Native partition-key pruning does not have this limit; table_range is for the cases native pruning cannot handle. - Pruning is a planning-time cost / execution-time win tradeoff: on small partitions
the per-plan overhead can exceed the scan it saves. Measure with
table_range.enable_pruning. NOT IN/<> ALL,NOT (...), expression predicates, and parameterized prepared-statement plans are kept rather than pruned.- Inserts keep summaries current incrementally, but deletes only relax them (the summary
can stay wider than the live data until a
VACUUM/REINDEXre-tightens it) — always correct, just potentially less selective. - Pruning engages only while the index is valid (
indisvalid); the planner ignores invalid indexes, so anything that invalidates a table_range index silently disables its pruning until rebuilt.