PLAN_ERGONOMICS — Developer & User Ergonomics Improvements

Status

All five original tasks implemented (2026-03-05). Three v0.4.0 ergonomic-hardening items implemented 2026-03-11. See git history for details.

Task Status
Task 1 — 'calculated' keyword + default change ✅ Done
Task 2 — Lower min_schedule_seconds default to 1 ✅ Done
Task 3 — Add default_schedule_seconds GUC ✅ Done
Task 4 — Remove diamond GUCs ✅ Done
Task 5 — Add TOC to SQL_REFERENCE.md and CONFIGURATION.md ✅ Done
ERG-B — Warn at _PG_init when cdc_mode='auto' but wal_level != 'logical' ✅ Done
ERG-C — Warn at create_stream_table when source has no PRIMARY KEY ✅ Done (pre-existing)
ERG-F — Emit WARNING when alter_stream_table triggers implicit full refresh ✅ Done
ERG-D — Record manual refresh_stream_table() in pgt_refresh_history with initiated_by='MANUAL' ✅ Done
ERG-E — pgtrickle.quick_health view (single-row status summary) ✅ Done

Remaining follow-up ✅ (completed 2026-03-14)

  • ✅ E2E tests for 'calculated' as schedule input (verify NULL input is rejected) — ERG-T1
  • ✅ E2E test that SHOW pg_trickle.diamond_consistency returns an error — ERG-T2
  • ✅ Update CHANGELOG.md with breaking behavioural changes (default schedule, NULL rejection) — ERG-T5
  • ✅ E2E test for ERG-B warning (assert WARNING appears in server log when wal_level != logical) — ERG-T4
  • ✅ E2E test for ERG-F warning (assert WARNING in client output when alter_stream_table with query =>) — ERG-T3

Overview

Five focused changes to make pg_trickle friendlier for new users while keeping full flexibility for production use.

No catalog schema migration required. 'calculated' is accepted/displayed as a parse-time alias for NULL; internal storage stays NULL.


Decisions Made

Topic Decision
'calculated' storage Accepted/rejected at parse time; stored as NULL — no SQL migration needed
create_stream_table default Change from '1m' to 'calculated' (breaking behavioral change; document in CHANGELOG)
default_schedule_seconds default 1 (matches new min_schedule_seconds default)
Diamond GUC fallbacks Hardcoded to 'atomic' / 'fastest' in Rust (EC-13 set 'atomic' as correct default); per-table params in create/alter_stream_table are kept

Task 1 — Replace NULL with 'calculated' as the schedule keyword

Files: src/api.rs, docs/SQL_REFERENCE.md

Background

Currently, passing schedule => NULL to create_stream_table activates CALCULATED mode (schedule derived from downstream dependents). NULL is unintuitive — 'calculated' is explicit and self-documenting.

NULL is kept in catalog storage (no migration), but is no longer accepted as SQL input. The alter sentinel (no-change) stays as SQL NULL internally.

Steps

  1. Input pre-processing — In create_stream_table and alter_stream_table, before calling parse_schedule / validate_schedule, add a check:

    • If schedule input is Some("calculated") → convert to None (CALCULATED mode).
    • If schedule input is None (SQL NULL) on a create call → return error: “use ‘calculated’ instead of NULL to set CALCULATED schedule”.
    • alter_stream_table’s own sentinel for “no change” remains NULL internally and is handled before the pre-processing step.
  2. Change create_stream_table SQL default — Change the schedule parameter default from "'1m'" to "'calculated'" (around line 35 of src/api.rs).

  3. Update docs — In docs/SQL_REFERENCE.md, replace all NULL schedule examples with 'calculated' and add a note that NULL is no longer valid as schedule input.


Task 2 — Lower pg_trickle.min_schedule_seconds default to 1

Files: src/config.rs, docs/CONFIGURATION.md

Background

The current default of 60 is appropriate for production but makes local development and testing slow and awkward. New users hitting “minimum 60 seconds” immediately is a bad first experience. Production operators know what they are doing and can raise this explicitly.

Steps

  1. In src/config.rs, change the default value of PGS_MIN_SCHEDULE_SECONDS from 60 to 1.

  2. Update docs/CONFIGURATION.md: change the documented default from 60 to 1.


Task 3 — Introduce pg_trickle.default_schedule_seconds GUC

Files: src/config.rs, src/api.rs, src/scheduler.rs, docs/CONFIGURATION.md

Background

min_schedule_seconds currently plays two distinct roles:

  1. Floor (validate_schedule): rejects schedules shorter than this value.
  2. Default (resolve_calculated_schedule): isolated CALCULATED stream tables (no downstream dependents) fall back to this as their effective refresh interval.

These should be separate GUCs so each can be tuned independently.

Current caller map

Location Current role
src/api.rs ~L1786 validate_schedule() Floor — keep using min_schedule_seconds
src/api.rs ~L1321 build_from_catalog() Default — switch to default_schedule_seconds
src/api.rs ~L2247 build_from_catalog() Default — switch to default_schedule_seconds
src/scheduler.rs ~L395 build_from_catalog() Default — switch to default_schedule_seconds

Steps

  1. In src/config.rs, add a new GUC static: rust static PGS_DEFAULT_SCHEDULE_SECONDS: GucSetting<i32> = GucSetting::<i32>::new(1);

    • Default: 1
    • Range: 186400
    • Context: SUSET
    • Description: “Default effective schedule (in seconds) for isolated CALCULATED stream tables that have no downstream dependents.” Register it alongside the existing GUCs.
  2. Add accessor: rust pub fn pg_trickle_default_schedule_seconds() -> i32 { PGS_DEFAULT_SCHEDULE_SECONDS.get() }

  3. In src/api.rs at ~L1321 and ~L2247, replace: rust config::pg_trickle_min_schedule_seconds() with: rust config::pg_trickle_default_schedule_seconds() when passing fallback_schedule_secs to StDag::build_from_catalog().

  4. In src/scheduler.rs at ~L395, same replacement.

  5. The validate_schedule() call at ~L1786 continues to use pg_trickle_min_schedule_seconds() as the floor — no change there.

  6. Update docs/CONFIGURATION.md: add a new GUC section for pg_trickle.default_schedule_seconds.


Task 4 — Remove pg_trickle.diamond_consistency and pg_trickle.diamond_schedule_policy GUCs

Files: src/config.rs, src/api.rs, src/scheduler.rs, tests/e2e_diamond_tests.rs, docs/CONFIGURATION.md

Background

The GUC defaults ('none' and 'fastest') are sensible for all practical use cases, already match the SQL column DEFAULT values in src/lib.rs, and are already the values asserted by the diamond E2E tests. Exposing them as settable GUCs adds API surface without meaningful benefit — users who need non-default values can specify them per stream table via create_stream_table / alter_stream_table parameters, which are not removed.

Current GUC usage

Location GUC Replace with
src/api.rs ~L80 pg_trickle_diamond_consistency() hardcoded "none"
src/api.rs ~L94 pg_trickle_diamond_schedule_policy() DiamondSchedulePolicy::Fastest
src/scheduler.rs ~L679 pg_trickle_diamond_schedule_policy() DiamondSchedulePolicy::Fastest

Steps

  1. In src/config.rs, remove:

    • PGS_DIAMOND_CONSISTENCY and PGS_DIAMOND_SCHEDULE_POLICY statics
    • Their GUC registrations
    • Accessor functions pg_trickle_diamond_consistency() and pg_trickle_diamond_schedule_policy()
  2. In src/api.rs at ~L80 and ~L94, replace the GUC accessor calls with hardcoded defaults:

    • "none" for diamond_consistency
    • DiamondSchedulePolicy::Fastest (or equivalent string "fastest") for diamond_schedule_policy
  3. In src/scheduler.rs at ~L679, replace the GUC accessor call with the hardcoded DiamondSchedulePolicy::Fastest value.

  4. The SQL column DEFAULT 'none' and DEFAULT 'fastest' in src/lib.rs (~L119–122) already match — no change needed.

  5. In tests/e2e_diamond_tests.rs, scan for any tests that explicitly set these GUCs via SET pg_trickle.diamond_consistency or reference the GUC names, and update those to use per-table params instead. Tests that only check per-table default values (e.g. test_diamond_consistency_default, test_diamond_schedule_policy_default) should continue to pass unchanged.

  6. Update docs/CONFIGURATION.md:

    • Remove the two GUC sections.
    • Update the “fifteen configuration variables” count in the Overview.

Task 5 — Add table of contents to SQL_REFERENCE.md and CONFIGURATION.md

Files: docs/SQL_REFERENCE.md, docs/CONFIGURATION.md

Steps

  1. In docs/SQL_REFERENCE.md: add a markdown TOC after the # SQL Reference heading, linking to all ## and ### headings.

  2. In docs/CONFIGURATION.md: add a markdown TOC after the # Configuration heading, linking to all ## and ### headings. The TOC should reflect the post-Task-4 state: diamond GUC sections removed, default_schedule_seconds section added.


Verification

just fmt && just lint                # zero warnings required
just test-unit                       # pure Rust tests
just test-integration                # Testcontainers tests
just test-e2e                        # E2E including diamond tests

Manual smoke checks

-- 'calculated' should be the default; no schedule argument needed
SELECT pgtrickle.create_stream_table('t', 'SELECT 1 AS x', 'src', 'public');

-- Passing NULL should return an error
SELECT pgtrickle.create_stream_table('t2', 'SELECT 1 AS x', 'src', 'public',
    schedule => NULL);
-- Expected: ERROR: use 'calculated' instead of NULL to set CALCULATED schedule

-- New GUC defaults
SHOW pg_trickle.min_schedule_seconds;       -- 1
SHOW pg_trickle.default_schedule_seconds;   -- 1

-- Removed GUCs should error
SHOW pg_trickle.diamond_consistency;        -- ERROR: unrecognized configuration parameter
SHOW pg_trickle.diamond_schedule_policy;    -- ERROR: unrecognized configuration parameter


Proposals (not yet decided)

The following improvements were identified during an ergonomics review. Each is described with enough context to make a decision. None are committed to implementation yet.


Proposal A — Return a record from create_stream_table and refresh_stream_table

Priority: High
Complexity: Low
Type: API addition (non-breaking if done as overload; breaking if current void signature changes)

Currently both functions return VOID. To check the outcome, callers must issue a follow-up SELECT on the catalog. This pattern is repeated everywhere in the test suite and will be repeated in every operator script.

Proposed return shapes:

-- create_stream_table
(pgt_id bigint, status text, created_at timestamptz)

-- refresh_stream_table
(status text, rows_inserted bigint, rows_deleted bigint, duration_ms float8)

Decision factors: - Does changing the return type break existing callers? Yes, if they use SELECT pgtrickle.create_stream_table(...) in a script that ignores the result — PostgreSQL will still succeed, but any code using PERFORM would need to be unchanged. - Should these be new overloaded functions or replace the existing ones? - Should status use a fixed enum ('created', 'already_exists', 'skipped', 'error') or a free-form string?


Proposal B — Warn at init if cdc_mode = 'auto' but wal_level != 'logical'

Priority: High
Complexity: Low
Type: Safety / UX

pg_trickle.cdc_mode = 'auto' is designed to automatically upgrade from TRIGGER to WAL-based CDC once PostgreSQL’s wal_level is set to logical. On a standard PostgreSQL install, wal_level defaults to replica. In this state, auto silently stays in TRIGGER mode indefinitely — producing no error, no warning, and no visible status change.

FAQ documentation acknowledges this but it is easy to miss.

Proposal: At extension load (_PG_init), check: rust if cdc_mode == "auto" && wal_level != "logical" { pgrx::warning!("pg_trickle: cdc_mode='auto' but wal_level is '{}', not 'logical'. \ WAL CDC will not activate until wal_level is changed and PostgreSQL restarted.", wal_level); }

Decision factors: - Is it acceptable to emit a WARNING at every backend startup if this condition is true? Could be noisy in shared environments. - Alternative: emit the warning only via a dedicated health check function, not at init time.


Proposal C — Warn on create_stream_table if source table has no PRIMARY KEY

Priority: High
Complexity: Low
Type: Proactive safety check

Stream tables over keyless sources silently mishandle duplicate rows: identical rows share the same internal __pgt_row_id and a duplicate INSERT may be treated as a no-op. This is documented as a known limitation under “Keyless Table Duplicate Row Limitation (G7.1)” in the FAQ, but users hit it without warning.

Proposal: During create_stream_table, query pg_constraint for the source table(s). If none of the sources referenced in the query have a PRIMARY KEY, emit: WARNING: source table "foo" has no PRIMARY KEY — duplicate rows will not be tracked correctly. Add a primary key or use a unique column as a row identity.

Decision factors: - Should this be a WARNING (non-fatal) or an ERROR (require explicit opt-in with allow_keyless => true parameter)? - The check only applies to the directly named source tables, not to derived sources (e.g., subqueries, CTEs). That limitation should be noted. - Cost: requires a catalog lookup per source table at creation time (cheap).


Proposal D — Record manual refresh_stream_table calls in pgt_refresh_history

Priority: Medium
Complexity: Medium
Type: Observability / consistency

pgt_refresh_history only contains rows inserted by the background scheduler worker (initiated_by = 'SCHEDULER'). Manual calls to pgtrickle.refresh_stream_table() are not recorded. This means:

  • Operators auditing history miss manual refreshes entirely.
  • Graphs of refresh frequency are inaccurate.
  • Test assertions that check “were there N refreshes?” reach around the history table and read data_timestamp instead.

Proposal: At the end of a successful refresh_stream_table call, insert a history row with initiated_by = 'MANUAL' and the same duration_ms, rows_affected, and status fields the scheduler already writes.

Decision factors: - History table is currently append-only and retention is controlled by pg_trickle.history_retention_days. Manual records would share that policy and clean up automatically — no special handling needed. - Should error outcomes (failed refreshes) also be recorded? Currently even scheduler failures are recorded; consistency suggests yes. - Volume concern: a hot path calling refresh_stream_table in a loop would create many history rows. Acceptable for now given the append-only design.


Proposal E — pgtrickle.quick_health convenience view

Priority: Medium
Complexity: Low
Type: Monitoring / new SQL object

The extension exposes 15+ monitoring functions with overlapping scopes. New operators have no obvious starting point for “is everything OK right now?”. The closest existing function is pgtrickle.health_check(), but it returns multiple rows with varying structure and requires understanding which columns to focus on.

Proposed view:

CREATE VIEW pgtrickle.quick_health AS
SELECT
    (SELECT COUNT(*) FROM pgtrickle.pgt_stream_tables) AS total_stream_tables,
    (SELECT COUNT(*) FROM pgtrickle.pgt_stream_tables WHERE consecutive_errors > 0) AS error_tables,
    (SELECT COUNT(*) FROM pgtrickle.pgt_stream_tables WHERE data_timestamp < now() - make_interval(secs => schedule_secs * 3)) AS stale_tables,
    pg_trickle_scheduler_running() AS scheduler_running,
    -- ... cdc health summary ...
    CASE WHEN error_tables = 0 AND scheduler_running THEN 'OK' ELSE 'DEGRADED' END AS status;

One row, one glance. If status = 'OK', stop reading.

Decision factors: - Should this be a VIEW (no parameters, always computes) or a function quick_health() returning a record? - A VIEW requires thinking about column stability across versions — harder to evolve. A function is easier to change. - Should it be in schema pgtrickle (user-facing) or pgtrickle_internal? - The “staleness” heuristic (3× schedule interval) needs a defined policy.


Proposal F — Emit WARNING when alter_stream_table triggers a full refresh

Priority: Medium
Complexity: Low
Type: Transparency / silent expensive operation

When alter_stream_table changes the refresh_mode, definition query, or certain parameters that invalidate the current materialization, it automatically executes a full refresh (equivalent to truncating and repopulating the stream table). This can be expensive on large tables and currently only emits a pgrx::info!() log, which is suppressed at most log levels and never surfaced to the calling client.

Proposal: Upgrade the message to pgrx::warning!() so it appears at the client’s session regardless of server log_min_messages: WARNING: pg_trickle: stream table "foo" refresh mode changed from INCREMENTAL to FULL; a full refresh was applied (N rows). This may take time on large tables.

Optionally also send a NOTIFY pgtrickle_events, '{"event":"full_refresh","table":"foo"}' for monitoring automation.

Decision factors: - Is WARNING the right severity, or should it be NOTICE? NOTICE is always shown to the client; WARNING implies something requires attention. - Should a full NOTIFY channel be introduced (could be useful for Proposal E as well), or is the log message sufficient? - Are there cases where the full refresh is expected and the warning would be spurious (e.g., during initial setup scripts)?


Proposal G — “Which monitoring function?” decision tree in docs

Priority: Medium
Complexity: Very Low
Type: Documentation

With 15+ monitoring functions in pgtrickle, new operators don’t know which to call. Common questions and their current answers:

Question Function needed
Is my stream table up to date? pgtrickle.get_staleness('name')
What failed and why? pgtrickle.get_refresh_history('name')
Is CDC capturing changes? pgtrickle.check_cdc_health()
Are triggers installed correctly? pgtrickle.trigger_inventory()
Is the scheduler running? pgtrickle.pg_stat_stream_tables
Overall system health? pgtrickle.health_check()

Proposal: Add a “Monitoring Quick Reference” section near the top of docs/SQL_REFERENCE.md (or as a dedicated docs/MONITORING.md) mapping common operational questions to the right function, with a one-line example for each.

Decision factors: - Where should it live: a new MONITORING.md, a section in SQL_REFERENCE.md, or a separate GETTING_STARTED sub-section? - Should it be a prose decision tree, a table, or a flowchart (Mermaid)?


Proposal H — “Special Cases & Known Limitations” section in SQL_REFERENCE.md

Priority: Medium
Complexity: Very Low
Type: Documentation

Several important behavioral edge cases are currently scattered through different documents:

Case Currently documented in
TopK (ORDER BY + LIMIT rewrite) FAQ + buried note in SQL_REFERENCE
Keyless table duplicate rows FAQ “Known Delta Computation Limitations”
SELECT * schema drift GETTING_STARTED + FAQ
JOIN key-change + delete edge case Footnote deep in SQL_REFERENCE
Direct DML blocked on stream tables On-demand error only

Users who hit one of these problems must search across multiple documents. A dedicated section consolidating all special cases and limitations would reduce support burden.

Decision factors: - Should this be a new top-level ## Special Cases section within SQL_REFERENCE.md, or a standalone docs/EDGE_CASES.md? - Should existing references in FAQ/GETTING_STARTED be replaced or kept with cross-links?


Proposal I — Improve cycle-detection error messages

Priority: Low
Complexity: Low
Type: Error quality

The current cycle detection error shows the cycle path but doesn’t guide the user on how to resolve it: ERROR: dependency cycle detected: foo → bar → foo

Users unfamiliar with DAG concepts won’t know what to do. A better message: ERROR: dependency cycle detected: foo → bar → foo HINT: Stream tables cannot form circular dependencies. To fix this, ensure each stream table only reads from base tables or stream tables that do not (directly or transitively) read from it.

Decision factors: - Should the HINT be added inline in the error string, or via a separate pgrx::hint!() call (which PostgreSQL displays as a “HINT:” line)?


Proposal J — Emit NOTICE when a query is auto-rewritten

Priority: Low
Complexity: Medium
Type: Transparency

pg_trickle rewrites user queries in up to five passes (view inlining, CTE flattening, alias stripping, etc.) before storing them in the catalog. Users who query pgt_stream_tables.definition after creating a stream table see a different query from the one they wrote, with no explanation.

Proposal: After the rewrite passes, if the stored definition differs from the input, emit a NOTICE: NOTICE: query was rewritten for incremental maintenance. Original: SELECT v.x FROM my_view v WHERE v.y > 0 Stored: SELECT base.x FROM base_table base WHERE base.y > 0

Decision factors: - How verbose should the notice be? Full before/after, or just “query was simplified”? - Rewriting is always intentional; would this notice confuse users into thinking something is wrong? - The rewrite is currently undone if refresh fails — the NOTICE should only fire after the definition is successfully committed.


CHANGELOG entries (to add under unreleased)

  • Breaking: create_stream_table now defaults schedule to 'calculated' instead of '1m'. Stream tables without an explicit schedule are now CALCULATED by default.
  • Breaking: Passing schedule => NULL to create_stream_table is now an error. Use schedule => 'calculated' instead.
  • Breaking: GUCs pg_trickle.diamond_consistency and pg_trickle.diamond_schedule_policy have been removed. Use per-table parameters in create_stream_table / alter_stream_table instead.
  • Changed: pg_trickle.min_schedule_seconds default lowered from 60 to 1 for better out-of-the-box developer experience.
  • New: pg_trickle.default_schedule_seconds GUC (default 1) controls the effective refresh interval for isolated CALCULATED stream tables.