Production Readiness — pgmnemo v0.2.1 Beta

Version: v0.2.1
Status: Public Beta
Last updated: 2026-05-09

This page answers four questions directly. No marketing language.


1. What does “beta” mean here?

Beta means: the core retrieval API (recall_lessons, store_lesson, traverse_causal_chain) is stable enough for production evaluation, but we have not yet run a sustained load campaign in a multi-tenant production environment and cannot guarantee forward API stability across minor versions.

Specifically: - SQL function signatures may change between 0.x minor versions (breaking changes will appear in CHANGELOG with migration SQL). - GUC names (pgmnemo.ef_search, pgmnemo.recency_weight, pgmnemo.tenant_id) are considered stable for 0.2.x but may be renamed in 0.3.x. - The upgrade path (ALTER EXTENSION pgmnemo UPDATE TO '...') is tested for sequential upgrades only; skip-version upgrades are not validated.

Beta does not mean experimental or unreliable for single-tenant deployments on PG17.


2. What is tested?

Area What we test Evidence
Retrieval accuracy LoCoMo (1 982 Q&A pairs, 10 conversations): recall@10 = 0.795, MRR = 0.548 benchmarks/locomo/results/v0.2.1_session_20260509/report.md
Retrieval accuracy LongMemEval (500 questions, bge-m3 embedder): recall@10 = 0.933, MRR = 0.855 benchmarks/longmemeval/results/v0.2.1_pgmnemo_20260509/report.md
Schema correctness make installcheck on vanilla PG17 Docker (amd64) CI on every PR
Upgrade path Sequential upgrade scripts from 0.1.4 → 0.2.0 → 0.2.0.1 → 0.2.1, idempotent DDL guards CHANGELOG §Upgrade sections
RLS isolation pgmnemo.tenant_id GUC policies: tenant A cannot read tenant B rows Manual verification per INS-032
Bug regression Named regressions for every INS-* fix: IN-param collision (INS-029), numeric cast (INS-030), idempotent DDL (INS-031) CHANGELOG v0.2.0.1, v0.2.1
Cycle guard traverse_causal_chain cycle detection via path array, all three direction modes Unit test in extension/sql/test_traverse.sql
EF search GUC pgmnemo.ef_search applied at recall_lessons() entry, clamped 10–500 CHANGELOG v0.2.1

Embedder note: All benchmark numbers use retrieval-only mode. No LLM-as-judge downstream evaluation has been run yet (see §3).


3. What is not yet guaranteed?

Gap Detail
LLM-as-judge / end-to-end QA accuracy We report retrieval recall@K only. Downstream answer quality (the metric competitors report as “LLM-judge accuracy”) is not yet measured for pgmnemo.
PG14–16 compatibility Install and upgrade scripts work on PG14–16 in informal testing; numeric cast fix (INS-030) was the only known PG14 regression. Formal installcheck CI does not run on PG14–16.
Sustained load / p99 latency at scale No stress-test or sustained load campaign has been run. The US-A2 acceptance criterion (≤40 ms p95 on 10K entries) is a design target, not a validated result.
arm64 prebuilt binary Source build works on arm64; prebuilt .so for arm64 is not yet distributed.
Skip-version upgrades Upgrading from 0.1.x directly to 0.2.1 (skipping intermediate versions) is untested.
Multi-tenant RLS under adversarial load RLS policies have been reviewed for correctness but not fuzz-tested or audited by a third party.
Recency weight calibration pgmnemo.recency_weight default lowered from 0.20 → 0.08 in v0.2.1 pending REC-1 ablation study. The ablation has not been published; the current default is a provisional best estimate.

4. What must a production adopter verify on their side?

Before running pgmnemo in a workload that matters, verify the following:

  1. Run make installcheck against your target PG version.
    If your PG version is not 17, run the test suite explicitly. PG14–16 deviations will surface here.

  2. Smoke-test the upgrade path from your current version.
    Run each ALTER EXTENSION pgmnemo UPDATE TO '...' step sequentially in a staging environment before applying to production.

  3. Validate RLS with your tenant ID scheme.
    Set pgmnemo.tenant_id and confirm cross-tenant queries return empty results. Do not rely on application-layer filtering alone.

  4. Measure your own p95 latency on your corpus size.
    Index your agent_lesson table with HNSW before load. Tune pgmnemo.ef_search (default 100) for your recall/latency tradeoff. The ≤40 ms p95 target was not benchmarked on real hardware.

  5. Pin the extension version in your migration scripts.
    Use ALTER EXTENSION pgmnemo UPDATE TO '0.2.1' explicitly, not UPDATE (latest). Minor version API changes are documented but will not be held back for you.

  6. Do not rely on LLM-as-judge accuracy numbers from competitor papers.
    pgmnemo v0.2.1 publishes retrieval recall only. If your application needs QA accuracy guarantees, you must run your own end-to-end evaluation.


Honest assessment, not a sales page. If you find a gap not listed here, open an issue.