PLAN: Feature Cleanup — Remove Low-Value Surface Before Public Release

Status: In Progress Target milestone: v0.2.0 Last updated: 2026-02-27


Motivation

Before the first public release accepts external users, every GUC, SQL function, and status value becomes a compatibility commitment. This plan identifies surface to remove or consolidate now — while there are zero external users — to keep the API simple, honest, and maintainable.

Items are sourced from direct code audit of config.rs, api.rs, monitor.rs, refresh.rs, and scheduler.rs. This plan does not include features that were speculated about but verified not to exist (ON COMMIT REFRESH, named schedules, per-table delta_threshold, stream_table_definition, force_full_refresh — none of these were built).


Items


C1 — Remove pg_trickle.merge_strategy GUC entirely

Priority: P0 — remove before public release Effort: 1–2 hours Location: config.rs, refresh.rs

Background

merge_strategy accepts three values: auto, merge, delete_insert.

  • delete_insert is being removed (F1 — unsafe, superseded by auto).
  • After removing delete_insert, the only remaining values are auto and merge.
  • auto uses MERGE for all deltas except when the estimated delta exceeds ~25% of the stream table row count, at which point it switches to DELETE+INSERT. Since DELETE+INSERT is being removed, auto will always use MERGE — making it identical to merge.
  • There is no remaining scenario where a user would get a different behaviour from merge vs auto.

Options

A. Keep merge_strategy but reduce to a single auto value - ✅ Technically harmless; GUC still validates the value - ❌ A GUC that accepts one value is confusing and adds documentation debt - ❌ The name “merge_strategy” implies there are strategies to choose from

B. Remove merge_strategy GUC entirely - ✅ Eliminates a GUC that provides zero choice after delete_insert removal - ✅ Simplifies refresh.rs: removes the strategy-selection branch - ✅ Simplifies config.rs: removes one GUC registration - ✅ Users who set pg_trickle.merge_strategy = 'merge' or 'auto' get an unknown-GUC warning (harmless) rather than silent no-op behaviour - ❌ Breaking change for any existing configuration — acceptable at pre-1.0

Decision: Option B — remove merge_strategy GUC entirely.

Status: DONE ✅

Implementation

  1. Remove PGS_MERGE_STRATEGY from config.rs
  2. Remove pg_trickle_merge_strategy() accessor
  3. Remove use_delete_insert branch in refresh.rs
  4. Remove delete_insert_template and delete_insert_sql fields from the refresh cache entry struct
  5. Update docs and GUC documentation table ✅

C2 — Fix SUSPENDED status: implement missing resume_stream_table()

Priority: P0 — referenced in error message but function does not exist Effort: 1–2 hours Location: api.rs line 500, scheduler.rs

Background

When a stream table accumulates pg_trickle.max_consecutive_errors consecutive refresh failures, the scheduler sets its status to SUSPENDED and fires an auto_suspended NOTIFY alert. The error message in refresh_stream_table() tells the user:

"stream table {}.{} is suspended; use pgtrickle.resume_stream_table() first"

pgtrickle.resume_stream_table() does not exist as a pg_extern function. The user is told to call a function that doesn’t exist.

This is a bug, not a feature to remove. It reveals that the SUSPENDED lifecycle is half-implemented: the scheduler can set the state, but there is no user-facing way to clear it and re-enable refreshes other than directly mutating the catalog table.

Options

A. Remove SUSPENDED status and auto-suspension logic - ✅ Eliminates the inconsistency - ✅ Simplifies the scheduler — no error-count tracking or state transitions - ❌ Removes a safety mechanism: without auto-suspension, a table with a permanent error (e.g., dropped source column) will retry indefinitely, consuming background worker cycles and filling the log - ❌ Users lose NOTIFY alerting for persistent failures

B. Implement pgtrickle.resume_stream_table(name text) function - ✅ Makes the documented lifecycle complete and consistent - ✅ Fixes the dangling error message - ✅ Auto-suspension remains as a safety valve for persistent errors - ✅ Small, clean implementation: reset status = 'ACTIVE', consecutive_errors = 0 in catalog, emit a NOTIFY - ❌ Adds one more SQL function to the public API (minor)

Decision: Option B — implement resume_stream_table(). The function is already promised by the error message; this just fulfils that promise.

Status: DONE ✅

Implementation

#[pg_extern(schema = "pgtrickle")]
fn resume_stream_table(name: &str) {
    // Parse schema.name, look up in catalog, assert status == SUSPENDED,
    // reset status = ACTIVE + consecutive_errors = 0, emit NOTIFY.
}

C3 — Remove pg_trickle.max_concurrent_refreshes GUC (or mark as no-op)

Priority: P1 — documents a capability that does not exist Effort: 30 min (document) or 2 hours (remove) Location: config.rs, scheduler.rs

Background

pg_trickle.max_concurrent_refreshes (default: 4, max: 32) implies the scheduler runs multiple refreshes in parallel. It does not. The scheduler processes stream tables sequentially in topological order within a single background worker (G8.5 in GAP_SQL_PHASE_7.md). The GUC is read but has no effect on behaviour.

Options

A. Document as “reserved for future use” in GUC description - ✅ Zero code change - ❌ A GUC that does nothing is misleading; users will set it expecting an effect and see none - ❌ Will need to remain backward-compatible once documented

B. Remove the GUC entirely until parallel refresh is implemented (v0.3.0) - ✅ Eliminates false advertising - ✅ Pre-1.0: no compatibility cost - ✅ Re-add when the feature is real (P2/P3 in v0.3.0) - ❌ Minor churn: add back in v0.3.0

C. Hard-code to 1 and remove the GUC - Same as B, just more explicit about the current behaviour

Decision: Option A — document max_concurrent_refreshes as “reserved for future use” in the GUC description. No code change; update the description string in config.rs to make it clear the value is not yet honoured.

Status: DONE ✅


C4 — Consolidate pg_trickle.merge_work_mem_mb and pg_trickle.merge_planner_hints

Priority: P2 — low user value, high documentation burden Effort: 1–2 hours Location: config.rs, refresh.rs

Background

Two GUCs control planner hint injection before MERGE execution:

  • pg_trickle.merge_planner_hints (bool, default: true) — enables injection of SET LOCAL enable_nestloop = off and SET LOCAL work_mem = '<N>MB'
  • pg_trickle.merge_work_mem_mb (int, default: 64) — sets the work_mem injected when delta ≥ 10,000 rows

These exist because PostgreSQL’s planner sometimes chooses nested-loop plans for medium/large delta MERGEs, causing P95 latency spikes. The solution is correct but exposes two internal tuning knobs.

Options

A. Keep both GUCs as-is - ✅ Maximum operator control - ❌ Two GUCs that 99% of users will never adjust - ❌ merge_work_mem_mb is only meaningful when merge_planner_hints = true — the interdependency is confusing

B. Remove merge_work_mem_mb, hard-code a sensible value - ✅ Reduces to one GUC (merge_planner_hints on/off) - ✅ The 64 MB default is well-chosen; there is no evidence any user needs to tune this independently - ❌ Removes one escape hatch for edge cases with unusually large deltas - ✅ Users who genuinely need a different work_mem can set it at the session/role level

C. Remove both GUCs, always inject hints - ✅ Simplest — the hints are always beneficial; no reason to disable - ❌ Removes the ability to disable hints if they cause regressions on specific PostgreSQL minor versions or hardware configurations - ❌ More risk than benefit

Decision: Option A — keep both pg_trickle.merge_work_mem_mb and pg_trickle.merge_planner_hints as-is. Maximum operator control is preserved for users who need to tune MERGE performance on large deltas.

Status: No action required ✅


C5 — Simplify pg_trickle.user_triggers GUC values

Priority: P3 — minor API surface simplification Effort: 1 hour Location: config.rs, refresh.rs

Background

pg_trickle.user_triggers originally accepted: auto (default), on, off.

  • auto: detect user-defined row-level triggers on the stream table and use explicit DML (DELETE + UPDATE + INSERT) so they fire correctly
  • on: always use explicit DML, even if no user triggers exist
  • off: always use MERGE; user triggers will NOT fire correctly

The on value is only useful if trigger detection is unreliable. If detection is reliable (which it is — pg_trickle checks pg_trigger directly), on is redundant with auto.

The off value explicitly disables trigger compatibility, which means user triggers silently fire with wrong TG_OP / OLD / NEW — a footgun.

Options

A. Keep all three values - ✅ Maximum flexibility - ❌ on is redundant with auto; off is a footgun

B. Remove on (redundant with auto) and rename off to pg_trickle.user_triggers = false (boolean GUC) - ✅ Cleaner: a boolean “respect user triggers yes/no” is clearer than three string values - ✅ Auto-detection is reliable enough that on adds no value - ❌ Minor API change

C. Remove off entirely, keep auto and on as true/false - ✅ Eliminates the footgun (off silently breaks user triggers) - ❌ Removes an escape hatch for the rare case where explicit DML has unacceptable overhead and the user knowingly has no triggers

Decision: Implemented in v0.2.3 with backward compatibility. Runtime behavior is now canonicalized to auto and off; the legacy on value is still accepted as a deprecated alias for auto so existing deployments do not break.

Status: DONE ✅


C6 — Hide WAL-mode GUCs until WAL CDC is production-ready

Priority: P3 — reduces false impression of production readiness Effort: 30 min (rename/comment) Location: config.rs

Background

pg_trickle.wal_transition_timeout is visible to users, implying that WAL CDC is a normal operational mode to configure. It is not — WAL CDC has three P1 correctness issues (F2, F3, F4 in GAP_SQL_PHASE_7.md) and is not production-ready until v0.3.0.

Options

A. Prefix WAL-mode GUCs with _experimental_ or _unsafe_ - ✅ Signals that these are not stable - ❌ Ugly; PostgreSQL GUC convention doesn’t use such prefixes

B. Keep GUCs but add prominent warnings in documentation and GUC description strings - ✅ No API change - ✅ GUC descriptions are shown by SHOW pgtrickle.wal_transition_timeout - ✅ The existing GUC description can say explicitly: “WAL CDC is not production-ready in v0.2.0; see docs.”

C. Remove or hide WAL-mode GUCs until v0.3.0 - ✅ Cleanest signal of “not ready” - ❌ pg_trickle.cdc_mode = 'auto' (the default) will attempt WAL transition after a successful refresh — the GUC is needed to control the timeout even in pre-production

Decision: Option B — update GUC description strings to explicitly state WAL CDC is pre-production in v0.2.0. No code change.

Status: DONE ✅


Summary

Item Action Priority Effort Status
C1 Remove merge_strategy GUC (entire) P0 1–2h ✅ Done
C2 Implement missing resume_stream_table() P0 1–2h ✅ Done
C3 Document max_concurrent_refreshes as reserved P1 30 min ✅ Done
C4 Keep both merge GUCs as-is P2 ✅ No-op
C5 Simplify user_triggers GUC P3 Canonical auto / off, keep on as deprecated alias ✅ Done
C6 Document WAL GUCs as pre-production P3 30 min ✅ Done

Total for v0.2.0 (C1–C3 + C6): completed.


What Does Not Exist (Speculative List Corrections)

The following items were considered but verified to not exist in the codebase and therefore require no removal:

Feature Status
ON COMMIT REFRESH clause Never built
pgtrickle.force_full_refresh() separate function Never built — full refresh is refresh_stream_table(name, force := true)
Per-table delta_threshold override Never built
pgtrickle.stream_table_definition() DDL reconstruction Never built
Named/reusable refresh schedules Never built
Separate pgtrickle.pause_stream_table() function Never built — suspension is automatic only
Per-operator timing breakdowns Never built — explain_st() exposes the operator tree but not timings