Production Readiness — pgmnemo v0.2.1 Beta

Production Readiness — pgmnemo v0.2.1 Beta

Version: v0.2.1
Status: Public Beta
Last updated: 2026-05-09

This page answers four questions directly. No marketing language.

1. What does “beta” mean here?

Beta means: the core retrieval API (recall_lessons, store_lesson, traverse_causal_chain) is stable enough for production evaluation, but we have not yet run a sustained load campaign in a multi-tenant production environment and cannot guarantee forward API stability across minor versions.

Specifically: - SQL function signatures may change between 0.x minor versions (breaking changes will appear in CHANGELOG with migration SQL). - GUC names (pgmnemo.ef_search, pgmnemo.recency_weight, pgmnemo.tenant_id) are considered stable for 0.2.x but may be renamed in 0.3.x. - The upgrade path (ALTER EXTENSION pgmnemo UPDATE TO '...') is tested for sequential upgrades only; skip-version upgrades are not validated.

Beta does not mean experimental or unreliable for single-tenant deployments on PG17.

2. What is tested?

Area	What we test	Evidence
Retrieval accuracy	LoCoMo (1 982 Q&A pairs, 10 conversations): recall@10 = 0.795, MRR = 0.548	`benchmarks/locomo/results/v0.2.1_session_20260509/report.md`
Retrieval accuracy	LongMemEval (500 questions, bge-m3 embedder): recall@10 = 0.933, MRR = 0.855	`benchmarks/longmemeval/results/v0.2.1_pgmnemo_20260509/report.md`
Schema correctness	`make installcheck` on vanilla PG17 Docker (amd64)	CI on every PR
Upgrade path	Sequential upgrade scripts from 0.1.4 → 0.2.0 → 0.2.0.1 → 0.2.1, idempotent DDL guards	CHANGELOG §Upgrade sections
RLS isolation	`pgmnemo.tenant_id` GUC policies: tenant A cannot read tenant B rows	Manual verification per INS-032
Bug regression	Named regressions for every INS-* fix: IN-param collision (INS-029), numeric cast (INS-030), idempotent DDL (INS-031)	CHANGELOG v0.2.0.1, v0.2.1
Cycle guard	`traverse_causal_chain` cycle detection via path array, all three direction modes	Unit test in `extension/sql/test_traverse.sql`
EF search GUC	`pgmnemo.ef_search` applied at `recall_lessons()` entry, clamped 10–500	CHANGELOG v0.2.1

Embedder note: All benchmark numbers use retrieval-only mode. No LLM-as-judge downstream evaluation has been run yet (see §3).

3. What is not yet guaranteed?

Gap	Detail
LLM-as-judge / end-to-end QA accuracy	We report retrieval recall@K only. Downstream answer quality (the metric competitors report as “LLM-judge accuracy”) is not yet measured for pgmnemo.
PG14–16 compatibility	Install and upgrade scripts work on PG14–16 in informal testing; numeric cast fix (INS-030) was the only known PG14 regression. Formal `installcheck` CI does not run on PG14–16.
Sustained load / p99 latency at scale	No stress-test or sustained load campaign has been run. The `US-A2` acceptance criterion (≤40 ms p95 on 10K entries) is a design target, not a validated result.
`arm64` prebuilt binary	Source build works on arm64; prebuilt `.so` for arm64 is not yet distributed.
Skip-version upgrades	Upgrading from 0.1.x directly to 0.2.1 (skipping intermediate versions) is untested.
Multi-tenant RLS under adversarial load	RLS policies have been reviewed for correctness but not fuzz-tested or audited by a third party.
Recency weight calibration	`pgmnemo.recency_weight` default lowered from 0.20 → 0.08 in v0.2.1 pending REC-1 ablation study. The ablation has not been published; the current default is a provisional best estimate.

4. What must a production adopter verify on their side?

Before running pgmnemo in a workload that matters, verify the following:

Run make installcheck against your target PG version.
If your PG version is not 17, run the test suite explicitly. PG14–16 deviations will surface here.
Smoke-test the upgrade path from your current version.
Run each ALTER EXTENSION pgmnemo UPDATE TO '...' step sequentially in a staging environment before applying to production.
Validate RLS with your tenant ID scheme.
Set pgmnemo.tenant_id and confirm cross-tenant queries return empty results. Do not rely on application-layer filtering alone.
Measure your own p95 latency on your corpus size.
Index your agent_lesson table with HNSW before load. Tune pgmnemo.ef_search (default 100) for your recall/latency tradeoff. The ≤40 ms p95 target was not benchmarked on real hardware.
Pin the extension version in your migration scripts.
Use ALTER EXTENSION pgmnemo UPDATE TO '0.2.1' explicitly, not UPDATE (latest). Minor version API changes are documented but will not be held back for you.
Do not rely on LLM-as-judge accuracy numbers from competitor papers.
pgmnemo v0.2.1 publishes retrieval recall only. If your application needs QA accuracy guarantees, you must run your own end-to-end evaluation.

Honest assessment, not a sales page. If you find a gap not listed here, open an issue.

PGXN

PostgreSQL Extension Network

Contents

Production Readiness — pgmnemo v0.2.1 Beta

1. What does “beta” mean here?

2. What is tested?

3. What is not yet guaranteed?

4. What must a production adopter verify on their side?