Plan: Make Refresh Mode Selection Optional with Sensible Default

Status: Complete
Author: Copilot
Date: 2026-03-04
Updated: 2026-03-07


1. Motivation

Today, create_stream_table exposes refresh_mode as a prominent fourth positional parameter:

SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region',
    '1m',
    'DIFFERENTIAL'   -- do users really need to think about this?
);

While the parameter already defaults to 'DIFFERENTIAL', its position in the function signature and its prominence in docs/examples implies the user should make an active choice between FULL, DIFFERENTIAL, and IMMEDIATE. In practice:

  • DIFFERENTIAL is the right choice for ~95% of stream tables. It applies delta-only updates when change volume is small, and adaptively falls back to FULL when the change ratio exceeds pg_trickle.differential_max_change_ratio (default 30%). It is strictly superior to FULL except in degenerate cases.
  • FULL is only beneficial when the defining query is cheaper to recompute from scratch than to diff (e.g., trivially small source tables, or queries the DVM engine cannot differentiate). Even then, DIFFERENTIAL’s adaptive fallback covers this automatically.
  • IMMEDIATE is a fundamentally different execution model (synchronous, in-transaction) and should only be used when sub-second latency is required at the cost of write-path overhead.

Forcing (or implying) that users choose a mode adds cognitive overhead, increases onboarding friction, and leads to suboptimal choices (users picking FULL “to be safe” when DIFFERENTIAL would be faster).


2. Current State

API signature

pgtrickle.create_stream_table(
    name            text,
    query           text,
    schedule        text   DEFAULT '1m',
    refresh_mode    text   DEFAULT 'DIFFERENTIAL',  -- already defaults
    initialize      bool   DEFAULT true,
    diamond_consistency   text DEFAULT NULL,
    diamond_schedule_policy text DEFAULT NULL
) → void

How DIFFERENTIAL is adaptive today

The differential refresh path (execute_differential_refresh in src/refresh.rs) already implements adaptive fallback:

  1. TRUNCATE detection — if a source table was truncated, fall back to FULL immediately.
  2. Change-ratio threshold — before running the delta query, count changes per source table. If change_count / table_size exceeds pg_trickle.differential_max_change_ratio (GUC, default 0.30), fall back to FULL. Per-ST override via auto_threshold.
  3. TopK scoped recomputation — TopK tables always use a MERGE-based recompute strategy regardless of mode.

This means DIFFERENTIAL already behaves like “auto” — use delta when efficient, fall back to full recompute when not.

Documentation & examples

Most SQL reference examples and tutorials explicitly pass 'DIFFERENTIAL' or 'FULL', reinforcing the idea that the user must choose.


3. Design

3.1 Philosophy

Refresh mode should be an optimization knob, not a required decision.

Users should be able to write:

SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region'
);

…and get the best behavior automatically. Advanced users can override when they have specific needs.

3.2 API changes

No signature change is needed — refresh_mode already defaults to 'DIFFERENTIAL'. The change is purely documentation, examples, and messaging:

Area Change
SQL Reference Move refresh_mode out of the “basic usage” examples. Show the 2-argument form as the primary example.
Getting Started Remove refresh_mode from the quickstart. Introduce it later in an “Advanced Configuration” section.
Tutorials Use the minimal form in all beginner tutorials. Only mention refresh_mode in the performance-tuning tutorial.
FAQ Add “Do I need to choose a refresh mode?” → “No. The default (DIFFERENTIAL) is adaptive and works well for almost all queries.”
alter_stream_table No change — refresh_mode is already optional (NULL = keep current).
Error messages When DVM parsing fails for a query in DIFFERENTIAL mode, downgrade to FULL automatically with an INFO message instead of rejecting the query.

3.3 Auto-downgrade for non-differentiable queries

Today, if the user creates a DIFFERENTIAL stream table with a query that the DVM engine cannot differentiate (e.g., unsupported constructs), the CREATE fails with an error. This forces users to understand the DVM’s limitations and explicitly choose FULL.

Proposed behavior: When refresh_mode is the default (DIFFERENTIAL) and DVM parsing fails, automatically downgrade to FULL and emit an INFO:

INFO: Query uses constructs not supported by differential maintenance;
      using FULL refresh mode. See docs/DVM_OPERATORS.md for supported operators.

This keeps the zero-config promise: any valid SELECT works without choosing a mode. Users who explicitly pass 'DIFFERENTIAL' still get the error (they asked for it specifically and should know why it fails).

Implementation detail — distinguish “user explicitly passed DIFFERENTIAL” from “used the default” by checking whether the parameter was provided. Since pgrx default!() doesn’t expose this, we have two options:

Option A: Sentinel value. Change the default to 'AUTO' and treat it as DIFFERENTIAL-with-fallback:

refresh_mode text DEFAULT 'AUTO'
match mode_str.to_uppercase().as_str() {
    "AUTO" => {
        // Try DIFFERENTIAL; fall back to FULL if DVM rejects query
    }
    "DIFFERENTIAL" => { /* strict: error on DVM failure */ }
    "FULL" => { /* always full */ }
    "IMMEDIATE" => { /* IVM triggers */ }
}

Option B: Nullable mode. Change to Option<&str> with default NULL meaning “auto”:

refresh_mode text DEFAULT NULL   -- NULL = auto (DIFFERENTIAL with FULL fallback)

Recommendation: Option A ('AUTO'). It is self-documenting, backward compatible (existing 'DIFFERENTIAL' calls retain strict behavior), and requires no schema migration for the catalog column.

3.4 Catalog representation

The pgt_stream_tables.refresh_mode column stores the resolved mode:

User specifies DVM parse succeeds Stored mode
'AUTO' (default) Yes DIFFERENTIAL
'AUTO' (default) No FULL
'DIFFERENTIAL' Yes DIFFERENTIAL
'DIFFERENTIAL' No Error (rejected)
'FULL' n/a FULL
'IMMEDIATE' Yes IMMEDIATE
'IMMEDIATE' No Error (rejected)

The catalog never stores 'AUTO' — it’s resolved at creation time. This means alter_stream_table and the scheduler don’t need to know about AUTO; they see only FULL / DIFFERENTIAL / IMMEDIATE.

3.5 alter_stream_table with query change

When ALTER changes the query (per PLAN_ALTER_QUERY.md), the same auto- downgrade logic applies: if the user doesn’t specify a new refresh_mode, the current mode is re-evaluated against the new query. If the ST was DIFFERENTIAL but the new query isn’t differentiable, it downgrades to FULL with an INFO message.

If the user explicitly passes refresh_mode => 'DIFFERENTIAL' alongside the query change, the strict behavior applies (error on DVM failure).


4. Documentation Rewrite

4.1 Primary example (SQL Reference)

Before: sql SELECT pgtrickle.create_stream_table( 'order_totals', 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region', '2m', 'DIFFERENTIAL' );

After: ```sql – Minimal: just name and query. Refreshes every minute using adaptive – differential maintenance. SELECT pgtrickle.create_stream_table( ‘order_totals’, ‘SELECT region, SUM(amount) AS total FROM orders GROUP BY region’ );

– With custom schedule: SELECT pgtrickle.create_stream_table( ‘order_totals’, ‘SELECT region, SUM(amount) AS total FROM orders GROUP BY region’, ‘2m’ ); ```

4.2 New FAQ entry

Q: Do I need to choose a refresh mode?

No. The default mode is adaptive: it uses differential (delta-only) maintenance when efficient, and automatically falls back to full recomputation when the change volume is high or the query cannot be differentiated. This works well for the vast majority of queries.

You only need to specify a mode explicitly when: - You want FULL mode to force recomputation every time (rare). - You want IMMEDIATE mode for sub-second, in-transaction updates (adds overhead to every write on source tables). - You want strict DIFFERENTIAL mode and prefer an error over silent fallback when the query isn’t differentiable.

4.3 Getting Started simplification

The quickstart guide should use the 2-argument form throughout, deferring refresh mode to an “Advanced: Refresh Modes” section.


5. Implementation Steps

Step 1: Add AUTO mode parsing ✅

File: src/dag.rs
Status: Complete (2026-03-06)

Added "AUTO" branch in RefreshMode::from_str() that resolves to RefreshMode::Differential. Added RefreshMode::is_auto_str() helper to detect when the user passed AUTO vs an explicit mode. Unit tests added for both.

Step 2: Auto-downgrade in create_stream_table_impl ✅

File: src/api.rs
Status: Complete (2026-03-06)

Three auto-downgrade points in create_stream_table_impl: 1. reject_unsupported_constructs() failure → downgrade to FULL with INFO 2. reject_materialized_views() failure → downgrade to FULL with INFO 3. parse_defining_query_full() failure → downgrade to FULL with INFO

When is_auto is false (user explicitly passed DIFFERENTIAL), errors propagate as before.

Step 3: Change default parameter value ✅

File: src/api.rs
Status: Complete (2026-03-06)

Changed default!(&str, "'DIFFERENTIAL'") to default!(&str, "'AUTO'").

Step 4: Update SQL upgrade script

File: sql/pg_trickle--0.2.1--0.2.2.sql (or current version)
Status: Deferred — no catalog migration needed since AUTO is never persisted. The new default only affects the pgrx-generated function signature, which is recreated on CREATE EXTENSION / ALTER EXTENSION UPDATE.

Step 5: Update documentation ✅

Files: docs/SQL_REFERENCE.md, docs/GETTING_STARTED.md, docs/FAQ.md
Status: Complete (2026-03-06)

  • SQL_REFERENCE.md: updated signature default and parameter table.
  • GETTING_STARTED.md: removed explicit refresh_mode => from examples, updated refresh modes table to include AUTO.
  • FAQ.md: added “Do I need to choose a refresh mode?” entry.

Step 6: Update dbt materialization ✅

File: dbt-pgtrickle/macros/materializations/stream_table.sql
Status: Complete (2026-03-06)

Changed default from 'DIFFERENTIAL' to 'AUTO'.

Step 7: E2E Tests ✅

Files: tests/e2e_create_tests.rs
Status: Complete (2026-03-06)

Six tests added in a new // ── AUTO Mode Tests ── section:

Test Scenario Priority
test_create_auto_mode_differentiable AUTO + differentiable query → stored as DIFFERENTIAL P1 ✅
test_create_auto_mode_not_differentiable AUTO + matview source → stored as FULL, INFO emitted P1 ✅
test_create_explicit_differential_not_differentiable Explicit DIFFERENTIAL + matview → error P1 ✅
test_create_no_mode_specified Omit refresh_mode entirely → defaults to AUTO behavior P2 ✅
test_backward_compat_differential Explicit 'DIFFERENTIAL' still works identically P2 ✅
test_backward_compat_full Explicit 'FULL' still works identically P2 ✅

test_alter_query_auto_downgrade deferred: alter_stream_table does not yet accept a query parameter (tracked in PLAN_ALTER_QUERY.md).


6. Remaining Work (Prioritized)

Priority Task Effort Status
P3 Tutorial docs update (docs/tutorials/*.md) ~30min ✅ Complete (2026-03-07)
P3 SQL Reference examples — reduce refresh_mode => repetition in advanced examples ~30min ✅ Complete (2026-03-07)
P4 test_alter_query_auto_downgrade — implement once alter_stream_table supports query changes (PLAN_ALTER_QUERY.md) ~30min Deferred

All P1, P2, and P3 tasks are complete. The feature is fully implemented, tested, and documented.

P3 — Docs cleanup details (2026-03-07)

Tutorials (docs/tutorials/): - Removed refresh_mode => 'DIFFERENTIAL' from the Setup section in all four tutorials (INSERT, UPDATE, DELETE, TRUNCATE). Each now shows the minimal 3-argument form. - Updated prose in WHAT_HAPPENS_ON_INSERT.md to reference the default “AUTO mode” instead of “DIFFERENTIAL mode — the default”.

SQL Reference (docs/SQL_REFERENCE.md): - Simplified first scheduling quick-example to omit redundant mode. - Added note above the Aggregate Examples block explaining that refresh_mode is omitted where AUTO applies. - Stripped refresh_mode => 'DIFFERENTIAL' from ~30 showcase examples across: aggregate functions, CTEs (non-recursive), set operations (INTERSECT, EXCEPT, UNION), LATERAL SRF, LATERAL subquery, WHERE subquery, HAVING, keyless tables, referencing other STs, COLLATE, IS JSON, SQL/JSON constructors, JSON_TABLE, partitioned tables, Views as Sources, and CUBE/ROLLUP. - Retained explicit modes where mode selection is the point: recursive CTE mode comparison, alter_stream_table mode-switch example, and known limitations sections.


7. Backward Compatibility

Scenario Impact
Existing create_stream_table(..., 'DIFFERENTIAL') calls No change — explicit DIFFERENTIAL retains strict behavior
Existing create_stream_table(..., 'FULL') calls No change
Existing create_stream_table(..., 'IMMEDIATE') calls No change
New calls omitting refresh_mode Was DIFFERENTIAL (strict), now AUTO (with fallback). Strictly more permissive — queries that previously failed now succeed with FULL mode
Catalog data No migration needed — AUTO is never stored
pg_trickle.differential_max_change_ratio GUC Unchanged — still governs adaptive fallback at runtime

7. Alternatives Considered

A. Keep DIFFERENTIAL as default, no AUTO

Pros: Simpler. No new mode name.
Cons: Users still hit errors when their query isn’t differentiable. They must understand DVM limitations to pick FULL. The “zero-config” promise is broken.

B. Silently downgrade DIFFERENTIAL too (not just AUTO)

Pros: Even simpler — no distinction between explicit and default.
Cons: Violates principle of least surprise. If a user explicitly requests DIFFERENTIAL, they expect differential behavior and should be told when it’s not possible.

C. Remove FULL mode entirely

Pros: Simplest API.
Cons: Some users legitimately want to force full recompute (e.g., for debugging, for queries where the delta query is pathologically slow). FULL remains useful as an escape hatch.


9. Milestones

Milestone Steps Status
M1: Core implementation Steps 1–3 ✅ Complete
M2: Documentation Steps 5–6 ✅ Complete
M3: E2E tests Step 7 ✅ Complete
M4: Docs cleanup P3 tutorial + SQL Reference ✅ Complete