JoinScan

JoinScan intercepts PostgreSQL join planning and replaces the standard executor with a DataFusion-based pipeline that operates entirely on Tantivy’s columnar fast fields. The core strategy is late materialization: execute the join using only index data, apply sorting and limits, then access the PostgreSQL heap only for the final K result rows.

Physical Plan

For a typical SELECT ... FROM files JOIN documents ... ORDER BY title LIMIT K:

ProjectionExec
  TantivyLookupExec                   ← materializes deferred strings for final K rows only
    SegmentedTopKExec                 ← per-segment pruning + global threshold + final sort + LIMIT K
      HashJoinExec                    ← join on fast fields
        PgSearchScan (documents)      ← BM25 search
        PgSearchScan (files)          ← lazy scan, deferred columns, receives dynamic filters

SegmentedTopKExec publishes dynamic filter thresholds that are pushed down through the join to the probe-side scan, pruning rows at the scanner level. It also performs the final materialized sort and LIMIT, so TantivyLookupExec only decodes K rows (not K×segments).

How It Works

1. Activation

JoinScan fires when all conditions are met: LIMIT present, equi-join keys exist, all columns are fast fields, all tables have BM25 indexes, and at least one @@@ predicate. See create_custom_path() for the full checklist.

2. Planning

The planner hook builds a JoinCSClause — a serializable IR capturing the RelNode join tree, predicates, ORDER BY, and LIMIT. This is stored in CustomScan.custom_private and deserialized at execution time.

3. Physical Plan Construction

scan_state.rs builds a DataFusion logical plan from the JoinCSClause, then runs physical optimization:

  1. SortMergeJoinEnforcer — converts HashJoin to SortMergeJoin when inputs are pre-sorted
  2. FilterPushdown (Post) — pushes dynamic filters through the join
  3. LateMaterializationRule — injects TantivyLookupExec to defer string materialization
  4. SegmentedTopKRule — injects SegmentedTopKExec for Top K on deferred columns, removes the now-redundant SortExec(TopK), wraps blocking nodes with FilterPassthroughExec
  5. FilterPushdown (Post) — second pass — pushes SegmentedTopKExec’s DynamicFilterPhysicalExpr down to the scan

4. Deferred Columns

String columns are emitted as a 3-way UnionArray (doc_address | term_ordinal | materialized) so intermediate nodes work with cheap integer ordinals instead of decoded strings. The decision to defer is made in try_enable_late_materialization().

5. Two Pruning Paths

Path Source Mechanism When Active
Global threshold SegmentedTopKExec DynamicFilterPhysicalExpr pushed to scan via filter pushdown After first K rows fill global heap (during collection)
Per-segment ordinals SegmentedTopKExec SegmentedThresholds side-channel After per-segment heap fills (intra-segment)

The global threshold is the primary pruning mechanism. It works during the collection phase because SegmentedTopKExec and PgSearchScan share an Arc<DynamicFilterPhysicalExpr> — no row flow required. The scanner reads current() on every batch and translates string literals to per-segment ordinal bounds via try_rewrite_binary.

6. Execution Result

After all input is consumed, SegmentedTopKExec materializes sort column values, performs the final sort, and emits exactly K rows. TantivyLookupExec decodes deferred strings for those K rows only. JoinScanState extracts CTIDs and fetches heap tuples — the only point where the PostgreSQL heap is accessed.

Key Files

File Purpose
mod.rs Lifecycle, activation checks, parallel support
build.rs RelNode, JoinCSClause, JoinSource
scan_state.rs DataFusion plan building, optimizer registration, result streaming
planner.rs SortMergeJoinEnforcer, FilterPassthroughExec usage
planning.rs Cost estimation, field validation, ORDER BY extraction
predicate.rs Postgres expression → JoinLevelExpr
translator.rs Postgres ↔ DataFusion expression mapping
explain.rs EXPLAIN output formatting

Execution-layer files under pg_search/src/scan/:

File Purpose
segmented_topk_exec.rs SegmentedTopKExec — per-segment heaps, global heap, build_global_filter_expression
segmented_topk_rule.rs Optimizer rule, wrap_blocking_nodes
tantivy_lookup_exec.rs Dictionary decode + filter passthrough
filter_passthrough_exec.rs Transparent wrapper enabling filter pushdown through blocking nodes
batch_scanner.rs Scanner::next() — batch iteration, pre-filter, visibility
execution_plan.rs PgSearchScanPlandynamic filter integration
pre_filter.rs try_rewrite_binary, collect_filters
deferred_encode.rs 3-way UnionArray construction and unpacking

GUCs

GUC Default Effect
paradedb.enable_join_custom_scan on Master switch
paradedb.enable_segmented_topk true SegmentedTopKExec injection
paradedb.enable_columnar_sort true Enables SortMergeJoin path