Contents
JoinScan
JoinScan intercepts PostgreSQL join planning and replaces the standard executor with a DataFusion-based pipeline that operates entirely on Tantivy’s columnar fast fields. The core strategy is late materialization: execute the join using only index data, apply sorting and limits, then access the PostgreSQL heap only for the final K result rows.
Physical Plan
For a typical SELECT ... FROM files JOIN documents ... ORDER BY title LIMIT K:
ProjectionExec
TantivyLookupExec ← materializes deferred strings for final K rows only
SegmentedTopKExec ← global threshold pruning + final sort + LIMIT K
HashJoinExec ← join on fast fields
PgSearchScan (documents) ← BM25 search
PgSearchScan (files) ← lazy scan, deferred columns, receives dynamic filters
SegmentedTopKExec publishes dynamic filter thresholds that are pushed down through the join to the probe-side scan, pruning rows at the scanner level. It also performs the final materialized sort and LIMIT, so TantivyLookupExec only decodes K rows (not K×segments).
How It Works
1. Activation
JoinScan fires when all conditions are met: LIMIT present, equi-join keys exist, all columns are fast fields, all tables have BM25 indexes, and at least one @@@ predicate. See create_custom_path() for the full checklist.
2. Planning
The planner hook builds a JoinCSClause — a serializable IR capturing the RelNode join tree, predicates, ORDER BY, and LIMIT. This is stored in CustomScan.custom_private and deserialized at execution time.
build.rs—RelNode,JoinCSClause,JoinSourceplanning.rs— cost estimation, field validationpredicate.rs— Postgres expression translationprivdat.rs— serialization
3. Physical Plan Construction
scan_state.rs builds a DataFusion logical plan from the JoinCSClause, then runs physical optimization:
SortMergeJoinEnforcer— converts HashJoin to SortMergeJoin when inputs are pre-sorted- FilterPushdown (Post) — pushes dynamic filters through the join
LateMaterializationRule— injectsTantivyLookupExecto defer string materializationSegmentedTopKRule— injectsSegmentedTopKExecfor Top K on deferred columns, removes the now-redundantSortExec(TopK), wraps blocking nodes withFilterPassthroughExec- FilterPushdown (Post) — second pass — pushes
SegmentedTopKExec’sDynamicFilterPhysicalExprdown to the scan
4. Deferred Columns
String columns are emitted as a 3-way UnionArray (doc_address | term_ordinal | materialized) so intermediate nodes work with cheap integer ordinals instead of decoded strings. The decision to defer is made in configure_deferred_outputs().
5. Pruning Path
There are two primary pruning mechanisms for dynamic filters that are pushed down to the scan:
Query-Time Pushdown (Inverted Index): Filters that are static and known at the start of the scan (such as
InListpredicates generated from aHashJoinbuild side) are intercepted during the firstpoll_nextof the scan stream. They are converted into native Tantivy queries (e.g.,TermSetQuery) andANDed into the main search query viatry_dynamic_filter_pushdown. This allows Tantivy to use its inverted index to filter documents while executing the search, providing the highest possible pruning performance. The DataFusion expressions are then rewritten tolit(true)so they are not evaluated again.Pre-Filter Pushdown (Fast Fields): For evolving thresholds, such as the global threshold from
SegmentedTopKExec, the threshold is pushed down to the scan via filter pushdown. This works becauseSegmentedTopKExecandPgSearchScanshare anArc<DynamicFilterPhysicalExpr>. The scanner readscurrent()on every batch and applies the filter after the search but before Arrow column materialization. For strings, it translates literals to per-segment ordinal bounds viatry_rewrite_binaryand filters directly against the fetched term ordinals.
6. Execution Result
After all input is consumed, SegmentedTopKExec materializes sort column values, performs the final sort, and emits exactly K rows. TantivyLookupExec decodes deferred strings for those K rows only. JoinScanState extracts CTIDs and fetches heap tuples — the only point where the PostgreSQL heap is accessed.
Key Files
| File | Purpose |
|---|---|
mod.rs |
Lifecycle, activation checks, parallel support |
build.rs |
RelNode, JoinCSClause, JoinSource |
scan_state.rs |
DataFusion plan building, optimizer registration, result streaming |
planner.rs |
SortMergeJoinEnforcer, FilterPassthroughExec usage |
planning.rs |
Cost estimation, field validation, ORDER BY extraction |
predicate.rs |
Postgres expression → JoinLevelExpr |
translator.rs |
Postgres ↔ DataFusion expression mapping |
explain.rs |
EXPLAIN output formatting |
Execution-layer files under pg_search/src/scan/:
| File | Purpose |
|---|---|
segmented_topk_exec.rs |
SegmentedTopKExec — per-segment heaps, global heap, build_global_filter_expression |
segmented_topk_rule.rs |
Optimizer rule, wrap_blocking_nodes |
tantivy_lookup_exec.rs |
Dictionary decode + filter passthrough |
filter_passthrough_exec.rs |
Transparent wrapper enabling filter pushdown through blocking nodes |
batch_scanner.rs |
Scanner::next() — batch iteration, pre-filter, visibility |
execution_plan.rs |
PgSearchScanPlan — dynamic filter integration |
pre_filter.rs |
try_rewrite_binary, collect_filters |
deferred_encode.rs |
3-way UnionArray construction and unpacking |
GUCs
| GUC | Default | Effect |
|---|---|---|
paradedb.enable_join_custom_scan |
on |
Master switch |
paradedb.enable_segmented_topk |
true |
SegmentedTopKExec injection |
paradedb.enable_columnar_sort |
true |
Enables SortMergeJoin path |