Changelog

Changelog

0.13.0 (in development)

GraphRAG release status

The narrow fact-shaped GraphRAG contract is now the intended stable 0.13 release surface:
- sorted_heap_graph_rag(...)
- sorted_heap_graph_register(...)
- sorted_heap_graph_config(...)
- sorted_heap_graph_unregister(...)
- sorted_heap_graph_rag_stats()
- sorted_heap_graph_rag_reset_stats()
Lower-level helper/wrapper building blocks remain beta:
- sorted_heap_expand_ids(...)
- sorted_heap_expand_rerank(...)
- sorted_heap_expand_twohop_rerank(...)
- sorted_heap_expand_twohop_path_rerank(...)
- sorted_heap_graph_rag_scan(...)
- sorted_heap_graph_rag_twohop_scan(...)
- sorted_heap_graph_rag_twohop_path_scan(...)
Code-corpus snippet/symbol/lexical retrieval contracts remain benchmark/reference logic, not the stable SQL surface.
Added make test-graphrag-release to run the full GraphRAG release-candidate bundle:
- SQL regression
- lifecycle
- crash recovery
- concurrent online-operation coverage
Added make test-release to run the broader 0.13 extension release bundle:
- core regression smoke
- policy/doc contract selftests
- dump/restore, TOAST, DDL, crash recovery, concurrent online ops
- pg_upgrade
- the narrower make test-graphrag-release bundle
Clarified the GraphRAG docs around:
- limit_rows as a work cap rather than a final result-count override
- one-hop score_mode := 'path' being intentionally equivalent to endpoint

FlashHadamard experimental status

Added sql/flashhadamard_experimental.sql as the explicit experimental SQL surface for the FlashHadamard retrieval branch.
The current canonical experimental point at 103K x 2880D is the exhaustive parallel engine scan via mmap-backed store, with 5-8 ms local p50 and a documented benchmark-reference helper path at 8.7 ms.
Added make test-flashhadamard and make bench-flashhadamard as the canonical experiment validation/benchmark entrypoints.
Documented the current execution-model caveat explicitly: pthread inside a PostgreSQL backend remains experimental and is not part of the stable 0.13 release contract.
Documented FH_INT16=1 as an Apple/NEON-only experimental optimization with a partially validated local end-to-end win; it remains opt-in and is not the release default.

Segmented GraphRAG scale verification

Consolidated the monolithic vs segmented 10M x 64D comparison in docs/benchmarks.md into a single side-by-side table.
On the constrained-memory AWS point (4 vCPU, 8 GiB RAM), segmented exact routing is 8.1x faster at depth 5 with better quality (100%/100% vs 75%/100%) compared to the monolith.
All-shard fanout offers no latency benefit — the win comes entirely from shard pruning.
Added bounded-fanout mode (--route bounded --fanout K) to the segmented benchmark harness. On the 1M x 64D local point:
- bounded(2/8) is 3.4x faster than monolithic with 96.9% hit@1
- bounded(4/8) is 1.6x faster with 93.8% hit@1
- latency scales roughly linearly with shards hit
- the win is not exact-or-nothing — imperfect routing still helps
Verified bounded fanout transfers to AWS 10M x 64D:
- bounded(2/8) is 4.0x faster than monolithic at depth 5
- bounded(4/8) is 2.0x faster
- gradient is smooth and linear with shards hit
Added routing-miss tolerance mode (--route bounded_recall --recall-pct N). Quality tracks router recall linearly; no sharp cliff. A router with 90% recall keeps 87.5% hit@1 while remaining 2-3x faster than monolithic. Routing quality determines answer quality, not latency.
Verified routing-miss tolerance at AWS 10M x 64D: at 90% recall, bounded(2/8) matches monolithic hit@1 (75%) while staying 4x faster. Finer crossover resolution remains limited by the small 4-query point.

Unified routed GraphRAG wrapper

Added sorted_heap_graph_route(...) as a thin operator-facing dispatcher over the existing exact/range + profile/policy/default routed GraphRAG wrappers.
Added sorted_heap_graph_route_plan(...) to explain which routing path, effective registry contract, and candidate shards the unified dispatcher would use.
Explicit call-site routing overrides now take precedence over route defaults; profile and policy paths remain mutually exclusive.

sorted_hnsw shared cache fix

Fixed a multi-index shared-cache corruption bug where shnsw_shared_scan_cache_attach() held bare pointers into shared memory. A subsequent publish for a different index overwrote the shared region, silently corrupting the first index’s cached HNSW graph.
The attach path now deep-copies all bulk data (L0 neighbors, SQ8 vectors, upper-level neighbor slabs) into local palloc’d buffers.
Added a multi-index overwrite regression phase (B5) to scripts/test_hnsw_chunked_cache.sh.
Verified: shared_cache=on and off produce identical retrieval quality on the 5K x 384D and 10K x 384D multihop benchmarks. shared_cache=off is no longer needed as a correctness workaround.

GraphRAG syntax unification

Added sorted_heap_graph_rag(...) as the new unified fact-shaped GraphRAG entry point.
The new syntax accepts:
- relation_path := ARRAY[hop] for one-hop retrieval
- relation_path := ARRAY[hop1, hop2] for two-hop retrieval
- relation_path := ARRAY[hop1, hop2, ...] for explicit multi-hop retrieval
- score_mode := 'endpoint' | 'path'
One-hop semantics are now aligned with the fact-graph contract: ANN seed selection is on entity_id, not target_id.
Added regression coverage for:
- one-hop unified syntax
- two-hop endpoint-scored unified syntax
- two-hop path-aware unified syntax
- generic path-aware multihop syntax
Added docs/graphrag-0.13-plan.md to separate the narrow stable 0.13 target from the broader experimental code-GraphRAG surface.

GraphRAG schema registration

Added:
- sorted_heap_graph_register(...)
- sorted_heap_graph_config(...)
- sorted_heap_graph_unregister(...)
GraphRAG helpers and wrappers can now run against non-canonical fact-table schemas as long as the mapped columns still satisfy the fact contract: int4 / int2 / int4 / svec / text.
Added regression coverage for an alias schema using:
- src_id
- edge_type
- dst_id
- vec
- body

GraphRAG lifecycle hardening

Added pg_extension_config_dump(...) coverage for sorted_heap_graph_registry, so registered GraphRAG mappings survive pg_dump / pg_restore.
Added scripts/test_graph_rag_lifecycle.sh and make test-graphrag-lifecycle to verify:
- 0.12.0 -> 0.13.0 extension upgrade
- alias-schema registration
- registry persistence across dump/restore
- persistence of segmented/routed GraphRAG registries across dump/restore:
  - shared shard metadata
  - shared segment_labels
  - range routing
  - exact-key routing
  - route policies
  - route profiles
  - route defaults
  - effective default segment_labels
- post-restore GraphRAG query correctness on registered alias schemas
- post-restore GraphRAG query correctness on routed/default-backed segmented GraphRAG queries
Added scripts/test_graph_rag_crash_recovery.sh and make test-graphrag-crash to verify crash recovery for:
- committed registered GraphRAG tables
- crash during insert into a registered/indexed graph table
- crash during compact on a registered graph table

GraphRAG observability

Added:
- sorted_heap_graph_rag_stats()
- sorted_heap_graph_rag_reset_stats()
GraphRAG now exposes backend-local last-call stats for:
- seed count
- expanded row count
- reranked row count
- returned row count
- ANN / expand / rerank / total timing
Added regression coverage for:
- direct helper observability via sorted_heap_expand_rerank(...)
- unified wrapper observability via sorted_heap_graph_rag(...)
The reported api field reflects the concrete top-level GraphRAG execution path, so unified wrapper calls report the underlying C path they dispatched to.

GraphRAG scale harnesses

Added --hop-weight to scripts/bench_graph_rag_multidepth.py so large synthetic multihop runs can vary the relative hop contribution without changing the SQL/GraphRAG contract.
Added sorted_hnsw.build_sq8 for constrained-memory index builds.
- builds the graph from SQ8-compressed build vectors instead of a full float32 build slab
- costs an extra heap scan during CREATE INDEX
- first bounded local result on 1M x 64D:
  - build_indexes: 48.606 s -> 46.541 s
  - depth-5 unified GraphRAG stayed 87.5% / 100.0%
Added scripts/bench_graph_rag_multidepth_aws.sh to run the synthetic multi-hop depth benchmark on a remote AWS host using the same sync/install pattern as the existing multihop AWS runners.
Added scripts/bench_graph_rag_multidepth_segmented.py to benchmark the first partitioning/segmentation path using multiple concrete sorted_heap shards plus harness-side fanout and global rerank.
Added scripts/bench_graph_rag_multidepth_segmented_aws.sh so the same segmented benchmark can be run on a constrained remote host without manual repo sync/install steps.
Added docs/graphrag-segmentation-plan.md to separate the post-0.13 large-scale architecture from the 0.13 release surface.
Added larger-scale benchmark notes for:
- local 1M-row measured query latency on the synthetic multidepth graph
- local and AWS 10M-row build-bound envelopes, where generation/load now survive but the first practical frontier remains sorted_hnsw build time
- retained-temp query-only sweeps on the AWS 10M x 32D cheap-build point, which showed that raising ef_search/ann_k as high as 256/256 still does not recover depth-5 quality on the same weak graph
- a follow-up 1M x 32D calibration showing that wider query budgets (ann_k=256, top_k=32) can recover 96.9% hit@k on smaller graphs even with a cheaper build
- a stronger AWS 10M x 32D falsifier showing that even exact heap seeds still return 0.0% / 0.0% at ann_k=256, top_k=32, so the remaining problem there is the low-dimensional scale contract, not just HNSW build quality
- a local 1M x 64D calibration showing the same widened contract reaches 65.6% hit@1 / 96.9% hit@k and ANN matches exact seeds there
- a follow-up 10M x 64D allocator diagnosis showing the failure came from the old contiguous local L0 scan-cache slabs, not from HNSW graph build itself
- a chunked local scan-cache fix that replaces giant local l0_neighbors and sq8_data allocations with page-backed storage for build seeding and shnsw_load_cache()
- a constrained-memory AWS 10M x 64D monolithic rerun on the same 4 vCPU / 8 GiB host with sorted_hnsw.build_sq8 = on and hop_weight = 0.05 that now completes:
  - load_data: 787.809 s
  - build_indexes: 846.795 s
- the first retained query-only pass on that exact built graph showing that the monolithic path is now viable but still not the final speed story:
  - depth 1 unified GraphRAG: 840.607 ms, 100.0% / 100.0%
  - depth 5 unified GraphRAG: 2084.155 ms, 75.0% / 100.0%
  - depth-2+ quality stayed aligned with the SQL baseline, but latency remained about 2x slower than the SQL path baseline
- so the current 10M x 64D frontier is no longer build survival or quality drift; it is monolithic query cost, which pushes the next scale branch toward segmentation + pruning
- a loader fast path for sorted_heap_only multidepth runs that copies directly into facts_sh before sorted_heap_compact(...) instead of staging through facts_heap; bounded local checks held the same compacted depth-5 quality/latency while improving ingest by about 10% (6.321 s -> 5.638 s at 200K rows, 31.392 s -> 28.231 s at 1M rows)
- a new multidepth harness knob --post-load-op compact|merge|none to compare post-load maintenance strategies on the same synthetic graph
- bounded local evidence that keeps compact as the default: none is much slower at query time, while merge is viable but does not materially beat compact on the larger 1M load point (28.142 s versus 28.108 s)
- an opt-in stage-breakdown path --report-stage-stats for the multidepth harness, backed by sorted_heap_graph_rag_stats()
- a local 1M x 64D lower-hop stage diagnosis showing that the widened multihop path is ANN-bound, not expansion-bound: at depth 5 the unified path took 110.507 ms end-to-end, of which about 109.178 ms was ANN, 0.691 ms expansion, and 0.011 ms rerank
- the first local segmented 1M x 64D GraphRAG point (8 shards, build_sq8=on) showing:
  - all-shard fanout preserves quality but is slower than the monolith (87.677 ms vs 50.104 ms at depth 1, 142.472 ms vs 121.524 ms at depth 5)
  - exact routing to the owning shard is the real partitioning win (10.574 ms at depth 1, 16.822 ms at depth 5, stable 100.0% / 100.0%)
  - so the next scale contract must be “segmentation + pruning”, not just “more shards”
- the first full AWS segmented 10M x 64D rerun on the same constrained 4 vCPU / 8 GiB host using streamed shard load and build_sq8=on:
  - generate_csv: 0.000 s
  - load_data: 500.474 s
  - build_indexes: 784.778 s
  - route=all matched the old monolithic query envelope almost exactly (898.440 ms at depth 1, 2093.652 ms at depth 5)
  - route=exact was the real scale win (126.057 ms at depth 1, 258.766 ms at depth 5, stable 100.0% / 100.0% at depth 5)
  - so the constrained-memory large-scale direction is now much clearer: productize segmented routing, not broad all-shard fanout
- the first SQL-level segmented reference path:
  - added sorted_heap_graph_rag_segmented(regclass[], ...)
  - it executes sorted_heap_graph_rag(...) across a caller-supplied shard set and merges shard-local top-k rows in SQL
  - local segmented smoke confirmed the SQL merge path matches the older Python merge path on quality/row counts with similar latency
  - routing/pruning still stays outside the extension for now; this wrapper only productizes fanout/merge
- the first metadata-driven routed GraphRAG reference path:
  - added sorted_heap_graph_segment_register(...), sorted_heap_graph_segment_config(...), sorted_heap_graph_segment_resolve(...), and sorted_heap_graph_segment_unregister(...)
  - added sorted_heap_graph_rag_routed(...) on top of the segmented wrapper
  - this beta surface lets callers register shard ranges once and then route by a supplied int8 key before segmented GraphRAG fanout/merge
  - local routed smoke showed the routed path matches exact-route segmented SQL quality/row counts with only small extra lookup overhead
- the exact-key routed companion for tenant / KB style routing:
  - added sorted_heap_graph_exact_register(...), sorted_heap_graph_exact_config(...), sorted_heap_graph_exact_resolve(...), and sorted_heap_graph_exact_unregister(...)
  - added sorted_heap_graph_rag_routed_exact(...)
  - local exact-key smoke stayed aligned with the exact-route segmented SQL merge path (0.202 ms vs 0.183 ms at depth 5, both 100.0% / 100.0%)
- the first richer shard-group filter on top of routed segmentation:
  - both range-routed and exact-key routed registries now accept an optional segment_group label
  - both config/resolve functions now accept optional segment_groups text[] filters
  - both routed wrappers now accept optional segment_groups := ARRAY[...] to narrow candidate shards before segmented GraphRAG fanout/merge
  - when segment_groups is present, its array order now becomes the shard preference order before bounded fanout is applied
  - this is the first beta surface for hot/sealed or relation-family shard pruning without changing the GraphRAG scoring contract
- the first registry-backed policy layer for shard-group preference:
  - added sorted_heap_graph_route_policy_register(...), sorted_heap_graph_route_policy_config(...), sorted_heap_graph_route_policy_groups(...), and sorted_heap_graph_route_policy_unregister(...)
  - added sorted_heap_graph_rag_routed_policy(...) and sorted_heap_graph_rag_routed_exact_policy(...)
  - this keeps hot/sealed preference in route metadata instead of repeating raw segment_groups := ARRAY[...] literals in every query
- the first second routing dimension on top of that policy layer:
  - both range-routed and exact-key shard registries now accept an optional relation_family label
  - both config/resolve functions now accept optional relation_family := ... filtering
  - both raw and policy-backed routed wrappers now accept optional relation_family := ... to narrow candidate shards after route resolution but before segmented GraphRAG fanout/merge
  - regression coverage now proves route+family and route+policy+family filtering for both range and exact-key routing
- the first route-profile convenience layer on top of that:
  - added sorted_heap_graph_route_profile_register(...), sorted_heap_graph_route_profile_config(...), sorted_heap_graph_route_profile_resolve(...), and sorted_heap_graph_route_profile_unregister(...)
  - added sorted_heap_graph_rag_routed_profile(...) and sorted_heap_graph_rag_routed_exact_profile(...)
  - this now stores either:
    - policy_name + relation_family + fanout_limit, or
    - inline segment_groups + relation_family + fanout_limit
  - so one route profile can be self-contained without a separate sorted_heap_graph_route_policy_registry row
  - regression coverage now proves both profile-backed wrappers match the existing sealed/right routed baselines
- the first default-profile operator layer on top of routed profiles:
  - added sorted_heap_graph_route_default_register(...), sorted_heap_graph_route_default_config(...), sorted_heap_graph_route_default_resolve(...), and sorted_heap_graph_route_default_unregister(...)
  - added sorted_heap_graph_rag_routed_default(...) and sorted_heap_graph_rag_routed_exact_default(...)
  - this binds one default profile per route so callers no longer need to pass profile_name on every query
  - regression coverage now proves both default-backed wrappers match the same sealed/right routed baselines as the explicit profile paths
- a shared shard-metadata cleanup under the routed beta surface:
  - added sorted_heap_graph_segment_meta_register(...), sorted_heap_graph_segment_meta_config(...), and sorted_heap_graph_segment_meta_unregister(...)
  - range-routed and exact-key routed config/resolve paths now fall back to shared per-shard segment_group / relation_family metadata when the route row leaves those labels NULL
  - row-local routed metadata still overrides shared shard metadata when both are present
  - regression coverage now proves both routed wrappers work when those labels live only in the shared shard-metadata registry
- the first multi-valued shard-label filter on top of that:
  - shared shard metadata now also accepts optional segment_labels text[]
  - range/exact config, catalog, and resolve functions now accept optional segment_labels := ARRAY[...] filters
  - raw, policy-backed, profile-backed, and default-backed routed wrappers now propagate that filter without changing GraphRAG scoring semantics
  - route profiles and route/default catalogs now expose profile-level and default effective segment_labels
  - regression coverage now proves label-based pruning through the shared shard-metadata path for both range and exact-key routing
- a first operator-facing shard catalog on top of that:
  - added sorted_heap_graph_segment_catalog(...) and sorted_heap_graph_exact_catalog(...)
  - both show route-local metadata, shared shard metadata, effective resolved metadata, optional shared/effective segment_labels, and per-column source markers (route|shared|unset)
  - this is introspection-only and does not change the routed GraphRAG execution contract
- a first operator-facing route-profile catalog on top of that:
  - added sorted_heap_graph_route_profile_catalog(...)
  - it shows profile-local policy_name, inline segment_groups, policy-backed segment_groups, effective group order, segment_groups_source (inline|policy|unset), relation_family, fanout_limit, optional profile-level segment_labels, and whether the profile is the current route default
  - this is introspection-only and does not change the routed GraphRAG execution contract
- a first route-level operator summary on top of that:
  - added sorted_heap_graph_route_catalog(...)
  - it shows one row per route with range-shard count, exact-binding count, policy/profile counts, and the effective default-profile contract, including default segment_labels
  - this is introspection-only and does not change the routed GraphRAG execution contract

sorted_hnsw build optimization

Fixed a real build-time memory-safety bug in src/hnsw_build.c: reverse link insertion intentionally overflows a neighbor list by one entry before shrink_connections() prunes it, but the in-memory neighbor arrays were previously allocated to only max_nbrs slots.
The build now allocates max_nbrs + 1 slots for those transient reverse inserts, removing the out-of-bounds write.
Local reproducer after the fix:
- 40K pairs / 200K rows / 64D / m=16 / ef_construction=64
- default contract hop_weight=0.15: 5/5 build-only passes
- lowered-hop contract hop_weight=0.05: 3/3 build-only passes
Removed the per-search visited[] allocation/zeroing from the hot HNSW build loop in src/hnsw_build.c and replaced it with a reusable visit-mark array.
On the local 500K x 32D diagnostic point (m=8, ef_construction=8), that reduced total CREATE INDEX time from about 18.1-18.7 s to 2.8-3.0 s, with the isolated graph-construction phase dropping from about 18.27 s to 2.42-2.59 s.
With that optimization in place, the AWS 10M x 32D cheap-build scale run progressed from “stuck in CREATE INDEX” to the first real 10M query pass.

GraphRAG concurrent online-operation hardening

Added scripts/test_graph_rag_concurrent.sh and make test-graphrag-concurrent to verify registered alias-schema fact graphs under:
- concurrent INSERT / UPDATE / DELETE
- concurrent GraphRAG queries
- concurrent sorted_hnsw KNN queries
- sorted_heap_compact_online(...)
- sorted_heap_merge_online(...)
The new harness verifies that:
- GraphRAG alias mappings remain registered
- the deterministic helper signature remains stable across online operations
- the unified GraphRAG wrapper remains callable and non-empty
- sorted_hnsw indexes stay valid and usable
- backend-local GraphRAG observability still reports non-empty stage stats

GraphRAG larger real-corpus verification

Refreshed the 0.13 GraphRAG plan and benchmark docs with a larger in-repo cogniformerus transfer gate.
Verified that the old tiny-budget code-corpus point (top_k=4) drifts on the full 183-file cogniformerus repository to about 87% keyword coverage and 66.7% full hits.
Verified that increasing only the final result budget to top_k=8 restores repeated-build stable 100.0% / 100.0% on that larger in-repo Crystal corpus for both:
- generic prompt_summary_snippet_py
- code-aware prompt_symbol_summary_snippet_py
Added mixed-language code-corpus support to the benchmark harness via:
- JSON question fixtures
- configurable source extensions
- quoted C/C++ include-edge extraction
Added the first real ~/Projects/C adversary gate on pycdc using scripts/fixtures/graph_rag_pycdc_questions.json.
Verified on pycdc that:
- the fast generic point is repeated-build stable but partial (90.0% / 60.0%)
- the code-aware helper-backed compact include rescue is repeated-build stable at 100.0% / 100.0%
Added the first archive-side adversary gate on ~/SrcArchives/apple/ninja/src using scripts/fixtures/graph_rag_ninja_questions.json.
Verified on ninja/src that:
- the plain generic prompt_summary_snippet_py path is repeated-build stable at 100.0% / 100.0% once the final result budget is raised to top_k=12
- the code-aware prompt_summary_snippet_py path remains partial (85.0% / 80.0%) on the same corpus
The scoped 0.13 larger real-corpus gate now spans:
- ~/Projects/Crystal
- ~/Projects/C
- ~/SrcArchives

0.12.0 (2026-03-26)

Release documentation pass

Public docs now split the surface into:
- stable: sorted_heap table AM and sorted_hnsw Index AM
- beta: GraphRAG helper/wrapper API
- legacy/manual: IVF-PQ and sidecar HNSW paths
README performance summary was narrowed to representative rows and had the stale narrow-range comparison removed.
README, docs/index.md, docs/vector-search.md, and docs/limitations.md now document the current sorted_hnsw ordered-scan contract: base-relation ORDER BY embedding <=> query LIMIT k, with explicit notes about LIMIT, ef_search, and filtered-query caveats.
docs/api.md now includes:
- sorted_hnsw.ef_search
- sorted_hnsw.sq8
- stable sorted_hnsw usage examples
- beta GraphRAG function reference and usage examples
docs/benchmarks.md now labels GraphRAG benchmark sections as beta-facing.

0.10.0 (2026-03-14)

Documentation release: comprehensive rewrite of README with use-case examples, updated benchmarks, and usage guides. No SQL or C changes from 0.9.15.

0.9.15 (2026-03-13)

Scan planner fixes

Prepared-mode OLTP cliff fix: Path B cost estimator now includes uncovered tail pages (pages beyond zone map entries created by UPDATEs). Previously the generic plan estimated 1 block but scanned 90+, keeping Custom Scan over Index Scan. Mixed OLTP: 56 → 28K tps.
DML planning skip: set_rel_pathlist hook bails out immediately for UPDATE/DELETE on the result relation, avoiding bounds extraction and range computation overhead. UPDATE +10%, DELETE+INSERT reaches heap parity.
SCAN-1 regression test: verifies prepared statement generic plan uses Index Scan (not Custom Scan) after UPDATEs create uncovered tail pages.

UPDATE path optimization

Remove slot_getallattrs from UPDATE/INSERT zone map path: zonemap_update_entry only reads PK columns via slot_getattr (lazy deform). The prior slot_getallattrs call unnecessarily materialized all columns including wide svec vectors on every UPDATE/INSERT. Removing it eliminates ~530 bytes of needless deform per tuple for svec(128) tables. UPDATE vec col: 74% → 102% (parity). Mixed OLTP: 42% → 83%.
Lazy update maintenance (sorted_heap.lazy_update = on): opt-in mode that skips per-UPDATE zone map maintenance. First UPDATE on a covered page clears SHM_FLAG_ZONEMAP_VALID on disk; planner falls back to Index Scan. INSERT keeps eager maintenance. Compact/merge restores zone map pruning. UPDATE non-vec: 46% → 100%. Mixed OLTP: 83% → 97%.
SCAN-2 regression test: verifies lazy mode invalidation → Index Scan fallback → correct data → compact restores Custom Scan pruning.

CRUD performance contract (500K rows, svec(128), prepared mode)

Operation	eager / heap	lazy / heap	Notes
SELECT PK	85%	85%	Index Scan via btree
SELECT range 1000	97%	—	Custom Scan pruning (eager only)
Bulk INSERT	100%	100%	Always eager
DELETE + INSERT	63%	63%	INSERT always eager
UPDATE non-vec	46%	100%	Lazy skips zone map flush
UPDATE vec col	102%	100%	Parity both modes
Mixed OLTP	83%	97%	Near-parity with lazy

Eager mode (default) maintains zone maps on every UPDATE for scan pruning. Lazy mode (sorted_heap.lazy_update = on) trades scan pruning for UPDATE parity with heap. Compact/merge restores pruning. Recommended for write-heavy workloads where point lookups use Index Scan anyway.

HNSW sidecar search (`svec_hnsw_scan`)

7-arg interface: added rerank1_topk parameter for optional dense r1 pre-filter via {prefix}_r1 sidecar table (nid int4 PK, rerank_vec hsvec). The 6-arg form continues to work via PG_NARGS().
Session-local L0 cache (sorted_heap.hnsw_cache_l0 = on): seqscans L0 once per session (~95ms build, ~100MB for 103K nodes). Upper levels (L1–L4) cached separately (~6MB). OID-based invalidation on DDL.
ARM NEON SIMD for cosine_distance_f32_f16: vectorized mixed-precision distance with vld1q_f16 → vcvt_f32_f16 → vfmaq_f32, processing 8 f16 elements per iteration. Precomputed query self-norm (cosine_distance_f32_f16_prenorm) eliminates redundant norm_a accumulation in beam search (4 FMAs/iter vs 6). Compile-time guard: __aarch64__ && __ARM_NEON && HSVEC_NATIVE_FP16. Scalar fallback on x86. 27–36% speedup across all operating points.
Beam search micro-optimizations (22–29% additional speedup on cache paths):
- Visited bitset: replaced bool[] (1 byte/node, 103KB) with uint64[] bitset (1 bit/node, 12.9KB). Fits in L1 cache; faster membership tests.
- Neighbor prefetch: __builtin_prefetch on next unvisited neighbor’s cache node before computing distance on current. Hides L2/L3 latency for the 908-byte cache nodes scattered across the 96MB cache array.
- Cache-only upper search: hnsw_search_cached() bypasses hnsw_open_level/hnsw_close_level for warm upper-level caches. Eliminates table_open + index_open + index_beginscan overhead per level. Upper traversal: 0.17ms → 0.04ms (−74%).
Recommended operating points (103K x 2880-dim, hsvec(384) sketch):
- Balanced: ef=96 rk=48 → 0.70ms p50, 96.8% recall@10
- Quality: ef=96 rk=0 → 1.15ms p50, 98.4% recall@10
- Latency: ef=64 rk=32 → 0.50ms p50, 92.8% recall@10 Measured with shared_buffers=512MB (pod 2Gi), isolated per-config protocol (warmup pass + measure pass, no cross-config TOAST sharing). pgvector HNSW under same conditions: 1.70ms p50 (ef=64). Cold first-call latency is 2–3× higher due to TOAST page faults; shared_buffers must be sized to hold the rerank working set (~27MB for 50 queries at rk=48).
r1 verdict: marginal on warm pools. At ef>=96 the btree overhead exceeds TOAST savings. Useful only in cold-TOAST scenarios.
Tests: ANN-15 (basic HNSW), ANN-15b (bit-identical distances across cache states), ANN-16 (r1 sidecar), ANN-17 (r1 absent graceful skip), ANN-18 (relcache invalidation).
Adaptive ef (sorted_heap.hnsw_ef_patience): patience-based early termination for L0 beam search. When set to N > 0, the search stops after N consecutive node expansions that don’t improve the result set. ef becomes the maximum budget; easy queries converge sooner. Not beneficial for rk=0 (quality mode) where TOAST reads scale with ef.
Sketch dimension sweep (103K x 2880-dim Nomic, hsvec 384/512/768): recall@10 identical (98.2% at ef=96/rk=48, 93.6% at ef=64/rk=32) across all three dimensions; latency within noise (~1.2ms/q). First 384 dims via MRL prefix truncation already capture all discriminative power for this model. Recall ceiling is in navigation (graph quality / ef budget), not sketch fidelity. 384-dim is the correct sketch size.
Builder: build_hnsw_graph.py now accepts --sketch-dim (infers from data by default). Make target: make build-hnsw-bench-nomic.

Storage: persisted sorted prefix (S2)

Meta page v7: shm_sorted_prefix_pages persisted in meta page header. Split from shm_padding (struct size unchanged, backward compatible: v7 reader treats v6 as prefix=0).
O(1) prefix detection: detect_sorted_prefix returns persisted value when available, falls back to O(n) zone map scan for pre-v7 tables.
Conservative shrink paths: tuple_update override shrinks prefix when update lands on a prefix page. zonemap_update_entry detects both min-decrease and max-increase-with-overlap on prefix pages.
Merge restore: rebuild_zonemap_internal and merge early-exit both persist the recomputed prefix.
Characterization (scripts/bench_s2_prefix.sql):
- Append-only: prefix survives 100% at all fillfactors
- ff=100 + updates: 98% survival (non-HOT goes to tail)
- ff<100 + updates/recycle: prefix collapses (accepted tradeoff)
- Merge always restores
Tests: SH22-1 through SH22-5 covering compact, append, recycle insert, merge restore, and UPDATE-induced prefix shrink.

0.9.14 (2026-03-12)

svec_hnsw_scan: 6-arg hierarchical HNSW search via PG sidecar tables. Top-down descent through upper levels, beam search at L0 with hsvec sketches, exact rerank via main table TOAST vectors.

0.9.13

svec_graph_scan: flat NSW graph search with btree-backed sidecar.
IVF-PQ three-stage rerank with sketch sidecar (svec_ann_scan).

PGXN

PostgreSQL Extension Network

Contents

Changelog

0.13.0 (in development)

GraphRAG release status

FlashHadamard experimental status

Segmented GraphRAG scale verification

Unified routed GraphRAG wrapper

sorted_hnsw shared cache fix

GraphRAG syntax unification

GraphRAG schema registration

GraphRAG lifecycle hardening

GraphRAG observability

GraphRAG scale harnesses

sorted_hnsw build optimization

GraphRAG concurrent online-operation hardening

GraphRAG larger real-corpus verification

0.12.0 (2026-03-26)

Release documentation pass

0.10.0 (2026-03-14)

0.9.15 (2026-03-13)

Scan planner fixes

UPDATE path optimization

CRUD performance contract (500K rows, svec(128), prepared mode)

HNSW sidecar search (`svec_hnsw_scan`)

Storage: persisted sorted prefix (S2)

0.9.14 (2026-03-12)

0.9.13

PGXN

PostgreSQL Extension Network

Contents

Changelog

0.13.0 (in development)

GraphRAG release status

FlashHadamard experimental status

Segmented GraphRAG scale verification

Unified routed GraphRAG wrapper

sorted_hnsw shared cache fix

GraphRAG syntax unification

GraphRAG schema registration

GraphRAG lifecycle hardening

GraphRAG observability

GraphRAG scale harnesses

sorted_hnsw build optimization

GraphRAG concurrent online-operation hardening

GraphRAG larger real-corpus verification

0.12.0 (2026-03-26)

Release documentation pass

0.10.0 (2026-03-14)

0.9.15 (2026-03-13)

Scan planner fixes

UPDATE path optimization

CRUD performance contract (500K rows, svec(128), prepared mode)

HNSW sidecar search (svec_hnsw_scan)

Storage: persisted sorted prefix (S2)

0.9.14 (2026-03-12)

0.9.13

HNSW sidecar search (`svec_hnsw_scan`)