Contents

PLAN: Edge Cases — Catalogue, Workarounds & Prioritised Remediation

Status: Proposed Target milestone: v0.3.0+ Last updated: 2026-06-05


Motivation

pg_trickle’s documentation (FAQ, SQL Reference, Architecture) describes 36 discrete edge cases and limitations. Some are fundamental trade-offs, others are fixable engineering gaps. This document:

  1. Catalogues every known edge case in one place.
  2. Proposes concrete workarounds or fixes for each.
  3. Assigns a priority tier so the team can focus effort where it matters most.

Priority Tiers

Tier Meaning Guidance
P0 Data correctness risk — users can silently get wrong results Fix before v1.0
P1 Operational surprise — confusing failure, silent non-behaviour Fix before v1.0
P2 Usability gap — documented but painful; strong workaround exists Schedule for a minor release
P3 Accepted trade-off — inherent to the design; document clearly Keep documentation up to date

Edge Case Catalogue

EC-01 — JOIN key change + simultaneous right-side DELETE

Field Value
Area DIFFERENTIAL — JOIN delta
Tier P0
Impact Stale row remains in stream table until next full refresh
Current mitigation Adaptive FULL fallback (pg_trickle.adaptive_full_threshold)
Documented in FAQ § “Known edge cases”; SQL_REFERENCE § “Known Delta Computation Limitations”
Status IMPLEMENTED — Part 1 split (R₀ via EXCEPT ALL) in diff_inner_join, diff_left_join, diff_full_join

Root cause: The delta query reads current_right after all changes are applied. When the old join partner is deleted before the delta runs, the DELETE half of the join finds no partner and is silently dropped.

Proposed fix (chosen approach: R₀ via EXCEPT ALL — medium-term):

The short-term threshold adjustment is not viable: it trades query correctness for a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold.

Primary fix — R₀ via EXCEPT ALL (4–6 days):

The delta formula in src/dvm/operators/join.rs is:

ΔJ = (ΔQ ⋈ R₁) + (L₀ ⋈ ΔR)        [current code]

Part 2 already computes L₀ (pre-change left state) via EXCEPT ALL for Scan children (left_part2_source in diff_inner_join). The fix mirrors this symmetrically for Part 1’s right side — splitting Part 1 into:

Part 1a: ΔQ_inserts ⋈ R₁           (post-change right — unchanged)
Part 1b: ΔQ_deletes ⋈ R₀           (pre-change right via EXCEPT ALL)

R₀ is computed identically to the existing L₀: sql R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes

build_snapshot_sql() already exists in join_common.rs and is reused here. The correction term (Part 3) was added for the L₁ case in Part 2 and is unaffected — Part 2 stays unchanged.

Implementation plan: - Day 1–2: Add right_part1_source decision + EXCEPT ALL CTE analogous to use_l0 / left_part2_source. Mirror use_r0 logic for nested right children (no SemiJoin, ≤ 2 scan nodes → R₀; otherwise R₁). - Day 2–3: Split Part 1 SQL into two UNION ALL arms; update row ID hashing for Part 1b arm. - Day 4–5: Integration tests — co-delete scenario, UPDATE-then-delete, multi-cycle correctness. - Day 5–6: TPC-H regression suite (Q07 is the critical path).

Deferred — pre-image capture from change buffer:

The long-term approach (using old_col values already stored in the change buffer trigger to reconstruct R₀ from ΔR directly, skipping the live table scan) is deferred. The old_col values are already captured by the CDC trigger (OLD."col""old_col" for UPDATE/DELETE), but surfacing them through DiffContext/DeltaSource requires structural changes to the delta CTE column representation that carry higher risk. Estimated additional effort: 7–10 days. Schedule for v1.x when the EXCEPT ALL approach is already correct and benchmarked.


EC-02 — CUBE/ROLLUP expansion limit (> 64 branches)

Field Value
Area DIFFERENTIAL — grouping sets
Tier P3
Impact Creation error
Current mitigation Use explicit GROUPING SETS(…)
Documented in SQL_REFERENCE § “CUBE/ROLLUP Expansion Limit”
Status Donepg_trickle.max_grouping_set_branches GUC (default 64, range 1–65536)

Proposed fix: The 64-branch limit is a sensible guard. Keep it but make the limit configurable via a GUC Implemented: PGS_MAX_GROUPING_SET_BRANCHES in src/config.rs; parser.rs reads GUC dynamically instead of const. Documented in CONFIGURATION.md.


EC-03 — Window functions inside expressions

Field Value
Area DIFFERENTIAL — window functions
Tier P2
Impact Creation error
Current mitigation Keep window functions as top-level SELECT columns
Documented in FAQ § “Window Functions”
Status IMPLEMENTEDrewrite_nested_window_exprs() subquery-lift in src/dvm/parser.rs; helper collect_all_window_func_nodes() for recursive window func harvesting; deparse_select_window_clause() for WINDOW clause passthrough; exported from src/dvm/mod.rs; wired in src/api.rs before DISTINCT ON rewrite

Implementation: nested-subquery lift (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 1.3):

Window functions nested inside expressions are lifted into a synthetic column in an inner subquery, then the outer SELECT applies the original expression with the window function replaced by the synthetic column alias:

-- Before (rejected):
SELECT id, ABS(ROW_NUMBER() OVER (ORDER BY score) - 5) AS adjusted_rank
FROM players

-- After rewrite (supported):
SELECT id, ABS("__pgt_wf_inner"."__pgt_wf_1" - 5) AS "adjusted_rank"
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY score) AS "__pgt_wf_1"
    FROM players
) "__pgt_wf_inner"

The subquery approach is preferred over the earlier CTE approach because a WITH clause produces a WithQuery node in the OpTree, requiring the DVM engine to handle CTEs wrapping window functions — an untested path. The subquery produces a nested Scan → Window → Project chain that the existing DVM path already handles correctly.

Test coverage: 9 E2E tests (e2e_window_tests.rs): - 6 creation-acceptance tests: test_window_in_case_expression_rejected, test_window_in_coalesce_rejected, test_window_in_arithmetic_rejected, test_window_in_cast_rejected, test_window_deeply_nested_rejected, test_top_level_window_still_works (regression). - 3 data-correctness tests: test_ec03_case_window_data_correctness — CASE + ROW_NUMBER() produces correct tier labels; test_ec03_arithmetic_window_data_correctness — ROW_NUMBER() * 10 verified with a FULL refresh after INSERT; test_ec03_coalesce_window_data_correctness — COALESCE(SUM() OVER, 0) produces correct partition sums.

Known limitation: Window-in-expression queries are accepted in DIFFERENTIAL mode (creation succeeds), but differential refresh fails with a column st.* does not exist error. Fixed (EC-03): AUTO mode now correctly falls back to FULL refresh with an informational message. Explicit DIFFERENTIAL mode emits a WARNING at creation time alerting the user that the pattern will fall back to full recomputation at refresh time. The DVM delta engine cannot currently resolve the st.* alias inside the inner subquery introduced by the rewrite. FULL refresh works correctly. Differential support for the rewritten form is deferred.


EC-04 — Non-monotone recursive CTEs rejected

Field Value
Area DIFFERENTIAL — recursive CTEs
Tier P3
Impact Creation error
Current mitigation Restructure as non-recursive CTE; or use FULL mode
Documented in FAQ § “Recursive CTEs”

Proposed fix: This is a fundamental limitation of semi-naive evaluation — non-monotone operators in the recursive term can “un-derive” rows, breaking the incremental fixpoint. No code fix proposed. Keep the clear error message and document the reasoning. Users who need non-monotone recursion should use FULL mode.


EC-05 — Foreign tables in DIFFERENTIAL mode

Field Value
Area DIFFERENTIAL — foreign data wrappers
Tier P2
Impact Creation error
Current mitigation Use FULL mode
Documented in SQL_REFERENCE § “Source Tables”
Status IMPLEMENTED — error message improved + polling CDC implemented

Proposed fix:

  1. Short term (Done): Improved the error message to suggest FULL mode explicitly and mention the postgres_fdw + IMPORT FOREIGN SCHEMA pattern.
  2. Medium term (Done): Polling-based CDC for foreign tables. Enabled via pg_trickle.foreign_table_polling = on GUC. Creates a snapshot table per foreign table source; before each differential refresh, computes EXCEPT ALL deltas and populates the standard change buffer. Snapshot is refreshed each cycle. Cleanup path drops snapshot tables when STs are removed.

EC-06 — Keyless tables with exact duplicate rows

Field Value
Area DIFFERENTIAL — row identity
Tier P0
Impact Phantom or missed deletes in stream table
Current mitigation Add a primary key; add a synthetic unique column; or use FULL mode
Documented in FAQ § “Keyless Tables”; SQL_REFERENCE § “Row Identity”
Status IMPLEMENTED — Full EC-06 fix: catalog has_keyless_source flag, non-unique index for keyless sources, net-counting delta SQL in scan.rs (decompose changes into atomic +1/-1 per content hash, expand via generate_series), counted DELETE (ROW_NUMBER matching) in refresh.rs and ivm.rs, plain INSERT (no ON CONFLICT) for keyless sources. WARNING at creation time also implemented.

Implementation (completed):

  1. Short term (P0): Emit a WARNING at create_stream_table() time ✅ Done
  2. Full fix (P0): ✅ Implemented via net-counting approach:
    • Catalog: Added has_keyless_source BOOLEAN NOT NULL DEFAULT FALSE to pgtrickle.pgt_stream_tables (set at creation time).
    • Index: Non-unique CREATE INDEX for keyless sources (instead of CREATE UNIQUE INDEX), allowing duplicate __pgt_row_id values.
    • Delta SQL (scan.rs): For keyless tables, decomposes all change events into atomic +1 (INSERT) / -1 (DELETE) operations per content hash. UPDATEs are split into their DELETE-old + INSERT-new components. Net counts are computed via GROUP BY content_hash + SUM(delta_sign), then expanded to the correct number of D/I rows using generate_series.
    • Apply (refresh.rs): For keyless sources, forces explicit DML path (cannot use MERGE with non-unique row_ids). Counted DELETE uses ROW_NUMBER matching to pair delta deletes 1:1 with stream table rows. Plain INSERT without NOT EXISTS / ON CONFLICT.
    • Apply (ivm.rs): Same counted DELETE + plain INSERT for IMMEDIATE mode.
    • Upgrade SQL: ALTER TABLE ... ADD COLUMN IF NOT EXISTS has_keyless_source

Bug fixes (post-implementation):

  • Guard trigger DELETE cancel: The DML guard trigger (EC-26) used RETURN NEW which is NULL for DELETE operations, silently cancelling managed-refresh DELETEs on the stream table. Fixed to use IF TG_OP = 'DELETE' THEN RETURN OLD; ELSE RETURN NEW; END IF;.
  • Keyless aggregate UPDATE no-op: The keyless UPDATE template was a no-op (SELECT 1 WHERE false), which is correct for keyless scans but wrong for aggregate queries on keyless sources (where 'I'-action rows update existing groups). Fixed to use the normal UPDATE template, which naturally matches 0 rows for scan-level keyless deltas.

Test coverage: 7 E2E tests (e2e_keyless_duplicate_tests.rs) — all passing. Covers: duplicate rows basic, delete-one-of-duplicates, update-one-of-duplicates, unique content multi-cycle DML, delete-all- then-reinsert, mixed DML stress, aggregate with duplicates.


EC-07 — IMMEDIATE mode write serialization

Field Value
Area IMMEDIATE mode — concurrency
Tier P3
Impact Reduced write throughput under high concurrency
Current mitigation Use DIFFERENTIAL mode for high-throughput OLTP
Documented in FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: This is an inherent trade-off of synchronous IVM — you cannot maintain consistency without serialization. The IvmLockMode already uses lighter pg_try_advisory_xact_lock for simple scan chains. No code fix proposed. Document the throughput characteristics in the performance guide and add a benchmark that quantifies the write-side overhead so users can make informed mode choices.


EC-08 — IMMEDIATE mode write amplification

Field Value
Area IMMEDIATE mode — latency
Tier P3
Impact Increased per-DML latency
Current mitigation Use DIFFERENTIAL mode
Documented in FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: Inherent trade-off. Potential optimisation: batch multiple trigger invocations within a single statement into one delta application (the current implementation already uses statement-level triggers with transition tables, which provides this batching naturally). Add a benchmark to benches/ that measures per-statement overhead for 1, 10, 100, 1000 affected rows.


EC-09 — IMMEDIATE mode unsupported constructs

Field Value
Area IMMEDIATE mode — SQL support
Tier P2
Impact Creation / alter error
Current mitigation Use DIFFERENTIAL mode
Documented in FAQ § “What SQL features are NOT supported in IMMEDIATE mode?”
Status Done — recursive CTEs + TopK now supported in IMMEDIATE mode

Proposed fix (recursive CTEs + TopK): deferred to implemented from PLAN_TRANSACTIONAL_IVM_PART_2.md Phase 5.

Both constructs are now implemented:

  • Recursive CTEs (G10 / Task 5.1): check_immediate_support() changed from rejection to warning about stack depth. Semi-naive evaluation proceeds with DeltaSource::TransitionTable.
  • TopK (G11 / Task 5.2): apply_topk_micro_refresh() in ivm.rs implements statement-level micro-refresh. Bounded by pg_trickle.ivm_topk_max_limit GUC (default 1000). (default 1000) to prevent inline recomputation latency spikes for large K.

For materialised view sources: use a polling-change-detection wrapper (same approach as EC-05 for foreign tables). No code fix proposed for this case.


EC-10 — IMMEDIATE mode: no throttling

Field Value
Area IMMEDIATE mode — back-pressure
Tier P3
Impact No rate limiting for high-frequency writes
Current mitigation Use DIFFERENTIAL mode with a schedule interval
Documented in FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: By design — throttling IMMEDIATE would break the same-transaction consistency guarantee. No code fix. Ensure the FAQ clearly explains that DIFFERENTIAL with a short interval (e.g. '1s') is the right answer when back-pressure is needed.


EC-11 — Schedule interval shorter than refresh duration

Field Value
Area Scheduler — timing
Tier P1
Impact Perpetual latency spiral; cascading DAG chains fall behind
Current mitigation Widen the interval; use IMMEDIATE mode; or split the DAG
Documented in FAQ § “Performance and Tuning”
Status IMPLEMENTEDscheduler_falling_behind NOTIFY alert at 80% threshold

Test coverage: SchedulerFallingBehind variant covered in test_alert_event_as_str and test_alert_event_all_variants_unique (monitor.rs unit tests).

Chosen fix: scheduler_falling_behind NOTIFY alert (short term):

When the refresh duration exceeds 80% of the schedule interval for 3 consecutive cycles, emit a NOTIFY on the channel pgtrickle_alerts with a JSON payload containing the stream table name, average duration, and configured interval. Operators can LISTEN on the channel and wire it into their alerting stack.

Deferred: - auto_backoff GUC (double effective interval when falling behind) — medium term, schedule for a later sprint. - explain_st(name) recommendation from pgt_refresh_history — long term.


EC-12 — No read-your-writes (DIFFERENTIAL/FULL)

Field Value
Area Scheduler — consistency model
Tier P3
Impact Stale reads immediately after write
Current mitigation Use IMMEDIATE mode or call refresh_stream_table() manually
Documented in FAQ § “Consistency”

Proposed fix: Inherent to the scheduled refresh model. No code fix. Consider a convenience function pgtrickle.write_and_refresh(dml_sql, st_name) that executes the DML and triggers a manual refresh in a single call. Low priority — manual refresh_stream_table() already covers this.


EC-13 — Diamond dependency split-version reads

Field Value
Area Diamond dependencies — consistency
Tier P1
Impact Fan-in node reads inconsistent upstream versions
Current mitigation diamond_consistency = 'atomic'
Documented in FAQ § “Diamond Dependencies”; SQL_REFERENCE
Status FULLY IMPLEMENTED — default 'atomic' + scheduler sub-transaction fix

Proposed fix: The atomic mode already fixes this. Remaining work:

  1. Short term (P1): Default diamond_consistency to 'atomic' for new stream tables (currently 'none'). This is a safer default — users who don’t need atomicity can opt out. ✅ Done.
  2. Medium term: Add a health_check() warning when a diamond graph exists but diamond_consistency = 'none'.

Scheduler bug fixed (follow-on): When EC-13 changed the default to 'atomic', a latent bug surfaced: the scheduler’s diamond group path used Spi::run("SAVEPOINT …") which PostgreSQL rejects in background workers (SPI_ERROR_TRANSACTION). This silently skipped every diamond group refresh. Fixed by replacing the SPI SAVEPOINT calls with pg_sys::BeginInternalSubTransaction / ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction, which bypass SPI. See src/scheduler.rs.


EC-14 — Diamond upstream failure skips downstream

Field Value
Area Diamond dependencies — failure handling
Tier P3
Impact Downstream stays stale for one cycle
Current mitigation Automatic retry on next cycle
Documented in FAQ § “Diamond Dependencies”

Proposed fix: Current behaviour is correct. Add a NOTIFY diamond_group_partial_failure when a group member fails so operators can investigate. No urgency — the automatic retry handles recovery.


EC-15 — SELECT * on source table + DDL

Field Value
Area DDL — schema evolution
Tier P1
Impact Refresh fails; stream table marked needs_reinit
Current mitigation Use explicit column lists in defining queries
Documented in FAQ § “Schema Changes”
Status IMPLEMENTED — WARNING at creation time for SELECT *

Test coverage: 10 unit tests for detect_select_star() pure function (api.rs): bare *, table-qualified t.*, explicit columns, count(*), sum(*), mixed agg+bare star, no asterisk, subquery star, case insensitivity, no-FROM clause.

Proposed fix:

  1. Short term (P1): ✅ Done — Emits WARNING at create_stream_table() time when the defining query contains SELECT *. Detection logic extracted to pure detect_select_star() function for unit testability.
  2. Medium term: When needs_reinit is triggered by a column-count mismatch, attempt an automatic reinitialisation: re-parse the query, verify the new column set is compatible, and refresh. If compatible, skip the needs_reinit flag and refresh directly.

EC-16 — ALTER FUNCTION body change not detected

Field Value
Area DDL — function dependency tracking
Tier P1
Impact Stream table silently uses new function logic without a full rebase
Current mitigation Manual full refresh after function changes
Documented in FAQ § “Schema Changes”
Status IMPLEMENTEDfunction_hashes TEXT column on pgt_stream_tables (migration 0.2.1→0.2.2); check_proc_hashes_changed() in refresh.rs queries md5(prosrc || coalesce(probin::text,'')) from pg_proc on each diff refresh; on mismatch calls mark_for_reinitialize + NOTICE; on first call stores baseline silently

Implemented fix:

Polling via pg_proc on each differential refresh: - On first call (stored hashes = NULL): compute and persist baseline. No reinit. - On following calls: recompute hashes. If any differ → mark_for_reinitialize + NOTICE. The next scheduler cycle performs a full refresh. - Uses md5(string_agg(prosrc || coalesce(probin::text,''), ',' ORDER BY oid)) to handle polymorphic overloads robustly.

Test coverage: 3 E2E tests (e2e_multi_cycle_tests.rs): - test_ec16_function_body_change_marks_reinit — verifies that CREATE OR REPLACE FUNCTION triggers needs_reinit = true on the next differential refresh. - test_ec16_function_change_full_refresh_recovery — verifies that after reinit, a full refresh produces correct results with the new function logic. - test_ec16_no_functions_unaffected — verifies that stream tables without user-defined functions are never affected by the hash polling mechanism.


EC-17 — Schema change during active refresh

Field Value
Area DDL — concurrency
Tier P2
Impact Refresh errors; stream table suspended
Documented in FAQ § “Schema Changes”
Status Done — FAQ section documents lock interaction

Proposed fix: No code fix needed Implemented: FAQ section added explaining ShareLock on source tables during refresh blocks concurrent ALTER TABLE (AccessExclusiveLock). The window is only open if the DDL sneaks in between lock acquisition and first read.


EC-18 — auto CDC mode silent non-upgrade

Field Value
Area CDC — mode transition
Tier P1
Impact User thinks WAL CDC is active; actually stuck on triggers forever
Current mitigation Default is TRIGGER mode; auto must be explicitly set
Documented in FAQ § “CDC Architecture”
Status IMPLEMENTED — rate-limited LOG every ~60 scheduler ticks

Proposed fix:

  1. Short term (P1): When cdc_mode = 'auto', emit a LOG-level message on every scheduler cycle that explains why the transition hasn’t happened (e.g. “wal_level is ‘replica’, need ‘logical’”). Rate- limit to once per 5 minutes.
  2. Medium term: Add a check_cdc_health() finding for “auto mode stuck in TRIGGER phase for > 1 hour”.

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs) + 1 existing test: - test_ec18_check_cdc_health_shows_trigger_for_stuck_auto — verifies check_cdc_health() reports TRIGGER mode for a keyless auto-CDC source and no alert fires for healthy TRIGGER-mode sources. - test_ec18_health_check_ok_with_trigger_auto_sources — verifies health_check() does not flag TRIGGER-mode auto-CDC sources as errors. - test_wal_keyless_table_stays_on_triggers (pre-existing) — verifies keyless tables remain on TRIGGER mode in auto-CDC configuration.


EC-19 — WAL mode + keyless tables need REPLICA IDENTITY FULL

Field Value
Area CDC — WAL decoding
Tier P0
Impact Silent data loss for UPDATEs/DELETEs on keyless tables in WAL mode
Current mitigation Set REPLICA IDENTITY FULL before switching to WAL CDC
Documented in FAQ § “CDC Architecture”
Status IMPLEMENTEDsetup_cdc_for_source() rejects at creation time

Proposed fix:

  1. Short term (P0): At create_stream_table() / alter_stream_table() time, when cdc_mode involves WAL and any source table is keyless, check pg_class.relreplident and reject with a clear error if not 'f' (full). This prevents the silent data loss entirely.
  2. Medium term: Automatically issue ALTER TABLE … REPLICA IDENTITY FULL for keyless tables when entering WAL mode (with a NOTICE).

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs): - test_ec19_wal_keyless_without_replica_identity_full_rejected — verifies that create_stream_table(cdc_mode => 'wal') on a keyless table without REPLICA IDENTITY FULL is rejected with a clear error. - test_ec19_wal_keyless_with_replica_identity_full_accepted — verifies that the same combination succeeds once REPLICA IDENTITY FULL is set.


EC-20 — CDC TRANSITIONING phase complexity

Field Value
Area CDC — mode transition
Tier P2
Impact Potential duplicate capture during transition; rollback to triggers on failure
Current mitigation LSN-based deduplication; automatic rollback
Documented in FAQ § “CDC Architecture”
Status Donecheck_cdc_transition_health() in scheduler.rs

Proposed fix: Add a trigger_inventory() + check_cdc_health() combined check Implemented: check_cdc_transition_health() runs after recover_from_crash() in the scheduler. Loads all stream table deps, filters TRANSITIONING entries, checks pg_replication_slots for slot existence, rolls back to TRIGGER mode if slot is missing, logs status.


EC-21 — Stream tables not propagated to logical replication subscriber

Field Value
Area Replication — architecture
Tier P3
Impact Subscriber gets static snapshot without refresh capability
Current mitigation Run pg_trickle only on primary
Documented in FAQ § “Replication”
Status Done — FAQ documentation sweep completed

Proposed fix: Inherent to the architecture — CDC triggers and the scheduler only run on the primary. No code fix. Consider a future “subscriber mode” that detects replicated source tables and runs its own CDC + refresh cycle, but this is a major undertaking (v2.0+).


EC-22 — Change buffers not published to subscribers

Field Value
Area Replication — change buffers
Tier P3
Impact Subscriber cannot drive its own refresh
Documented in FAQ § “Replication”
Status Done — FAQ documentation sweep completed

Proposed fix: By design. Same as EC-21 — keep documentation clear.


EC-23 — Scheduler does not run on standby / replica

Field Value
Area Replication — standby
Tier P3
Impact No IVM on read replicas
Current mitigation Query the primary; or promote the replica
Documented in FAQ § “HA / Replication”
Status Done — FAQ documentation sweep completed

Proposed fix: By design — standbys are read-only. No code fix. A future enhancement could add a read-only health endpoint on standby that reports the last refresh timestamp from the replicated catalog, so monitoring can detect staleness.


EC-24 — TRUNCATE on source table bypasses CDC triggers

Field Value
Area CDC — TRUNCATE handling
Tier P3
Impact Stream table not updated (until reinit on next cycle)
Current mitigation TRUNCATE event trigger marks needs_reinit; next cycle does full recompute
Documented in SQL_REFERENCE § “TRUNCATE”

Proposed fix: Already handled by the TRUNCATE event trigger. The only gap is latency — between the TRUNCATE and the next cycle, the stream table is stale. For IMMEDIATE mode, TRUNCATE triggers a same-transaction full refresh (already implemented). For DIFFERENTIAL/FULL, the current behaviour is acceptable. No further code fix needed.


EC-25 — TRUNCATE on a stream table itself

Field Value
Area Stream table — direct manipulation
Tier P1
Impact Frontier / buffer desync; future refreshes produce incorrect deltas
Current mitigation Use refresh_stream_table() to reset
Documented in SQL_REFERENCE § “What Is NOT Allowed”
Status IMPLEMENTED — BEFORE TRUNCATE guard trigger blocks direct TRUNCATE

Test coverage: test_guard_trigger_blocks_truncate E2E test (e2e_guard_trigger_tests.rs).

Proposed fix:

  1. Short term (P1): ✅ Done — BEFORE TRUNCATE trigger installed on storage tables at creation time. Checks pg_trickle.internal_refresh session variable to allow managed refreshes. The internal_refresh flag is set in all refresh code paths: execute_manual_refresh, execute_manual_full_refresh, execute_full_refresh, execute_topk_refresh, and pgt_ivm_handle_truncate.
  2. Medium term: Instead of blocking, intercept TRUNCATE and automatically perform a full refresh (same approach as IMMEDIATE mode TRUNCATE handling). This is friendlier but may surprise users who expect TRUNCATE to be instant.

EC-26 — Direct DML on stream table

Field Value
Area Stream table — direct manipulation
Tier P1
Impact Data overwritten or duplicated on next refresh
Current mitigation Documentation; never write directly
Documented in SQL_REFERENCE § “What Is NOT Allowed”
Status IMPLEMENTED — BEFORE INSERT/UPDATE/DELETE guard trigger on storage table

Test coverage: 4 E2E tests (e2e_guard_trigger_tests.rs) — test_guard_trigger_blocks_direct_insert, _update, _delete + test_guard_trigger_allows_managed_refresh.

Proposed fix:

  1. Short term (P1): ✅ Done — Row-level BEFORE trigger installed on storage tables at creation time. Uses pg_trickle.internal_refresh session variable (checked via current_setting(..., true)) to allow managed refreshes. Returns OLD for DELETE operations (returning NEW would be NULL, silently cancelling the DELETE). The internal_refresh flag is set across all refresh code paths.
  2. Medium term: Use a PostgreSQL row-level security policy or a pg_trickle.allow_direct_dml session variable as an escape hatch for power users.

EC-27 — Foreign keys on stream tables

Field Value
Area Stream table — constraints
Tier P3
Impact FK violations during bulk MERGE
Current mitigation Do not define FKs
Documented in SQL_REFERENCE § “What Is NOT Allowed”

Proposed fix: Reject ALTER TABLE … ADD CONSTRAINT … FOREIGN KEY on stream tables via event trigger (same pattern as EC-25/EC-26). P3 — the current documentation is sufficient, and FK creation on stream tables is rare.


EC-28 — PgBouncer transaction pooling

Field Value
Area Infrastructure — connection pooling
Tier P2
Impact Prepared statement errors, lock escapes, GUC not applied
Current mitigation Use session-mode pooling; use SET LOCAL
Documented in FAQ § “PgBouncer”
Status Done — documentation already exists

Proposed fix: pg_trickle’s scheduler uses a single dedicated backend connection (not pooled), so the scheduler itself is unaffected. The edge case applies to application queries that use SET pg_trickle.* GUCs via a pooled connection. No code fix — this is a PgBouncer limitation. Improve documentation with a recommended PgBouncer configuration snippet.


EC-29 — TopK + set operations / GROUPING SETS

Field Value
Area TopK — SQL restrictions
Tier P3
Impact Creation error
Documented in SQL_REFERENCE § “TopK”

Proposed fix: TopK uses scoped recomputation via MERGE, which fundamentally requires a single ORDER BY … LIMIT at the top level. Set operations and multi-level grouping sets break this invariant. No code fix. Keep the clear rejection error and document the workaround (split into separate stream tables).


EC-30 — Circular dependencies

Field Value
Area DAG — cycle detection
Tier P3
Impact Creation error with cycle path in message
Documented in FAQ § “Dependencies”; SQL_REFERENCE

Proposed fix: Already handled — rejected at creation time with a clear error listing the cycle path. See plans/sql/PLAN_CIRCULAR_REFERENCES.md for future exploration of safe fixed-point evaluation. No further work.


EC-31 — FOR UPDATE / FOR SHARE in defining query

Field Value
Area SQL — row locking
Tier P3
Impact Creation error
Documented in SQL_REFERENCE § “Rejected Constructs”

Proposed fix: By design — row locks serve no purpose in stream table defining queries. No code fix. The error message is clear.


EC-32 — ALL (subquery) not supported

Field Value
Area DIFFERENTIAL — subquery operators
Tier P2
Impact Creation error
Documented in FAQ § “Why Are These SQL Features Not Supported?”
Status IMPLEMENTEDparse_all_sublink() in src/dvm/parser.rs updated with NULL-safe anti-join condition (col IS NULL OR NOT (x op col)); full AntiJoin pipeline was already wired (parse_sublink_to_wrapperparse_all_sublinkAntiJoin SublinkWrapperdiff_anti_join)

Fix applied: NULL-safe anti-join condition.

ALL (subquery) is mathematically a conjunction of per-row comparisons. The full pipeline through parse_all_sublink()AntiJoin OpTree → diff_anti_join was already wired. The outstanding issue was NULL-safety: the previous condition NOT (x op col) evaluated to NULL (not TRUE) when col IS NULL, so NULL rows in the subquery were silently ignored, giving wrong results. Fixed by changing the condition to (col IS NULL OR NOT (x op col)), which correctly excludes the outer row whenever any subquery row is NULL or fails the comparison.

Test coverage: 8 E2E tests (e2e_all_subquery_tests.rs): - test_all_subquery_less_than_differential — basic price < ALL filter matching the SQL Reference worked example. - test_all_subquery_differential_inner_insert — differential refresh after inserting a cheaper competitor price that disqualifies a product. - test_all_subquery_differential_outer_change — outer INSERT + inner DELETE with recomputation. - test_all_subquery_null_in_inner — NULL-safety: ALL returns false when subquery contains NULL; removing NULL re-qualifies rows. - test_all_subquery_empty_inner — empty subquery: ALL against empty set yields true (SQL standard); inserting reduces result. - test_all_subquery_full_refresh>= ALL with FULL refresh mode. - test_all_subquery_equals_operator= ALL (value equals every row). - test_all_subquery_not_equals_operator<> ALL (equivalent to NOT IN).


EC-33 — Statistical aggregates (CORR, COVAR*, REGR*)

Field Value
Area DIFFERENTIAL — aggregates
Tier P3
Impact Creation error in DIFFERENTIAL mode
Current mitigation Use FULL mode; or pre-aggregate in DIFFERENTIAL and compute in a view
Documented in FAQ § “Unsupported Aggregates”

Chosen fix: group-rescan (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 4.1):

Implement CORR/COVAR_POP/COVAR_SAMP/REGR_* using the proven group-rescan strategy already used for BOOL_AND, STRING_AGG, etc.:

  1. Detect affected groups from the delta (groups where any row was inserted, updated, or deleted).
  2. For each affected group, re-aggregate from source: SELECT group_key, CORR(y, x) FROM source GROUP BY group_key
  3. Apply the result as an UPDATE to the stream table.

This requires no stream table schema changes and no new OpTree variants.

Deferred — Welford auxiliary accumulator columns:

The earlier proposal of maintaining sum_x, sum_y, sum_xy, sum_x2, count accumulator columns in the stream table using Welford’s online algorithm adapted to SQL EXCEPT/UNION was explored but is deferred. Rationale: it requires a breaking schema change to existing stream tables, the SQL delta-algebra correctness of the Welford adaptation is non-trivial to verify, and group-rescan is already correct and benchmarked for analogue aggregates. Revisit for v2.0 if group-rescan performance is insufficient for very large groups.


EC-34 — Backup / restore loses WAL replication slots

Field Value
Area Operations — disaster recovery
Tier P1
Impact CDC stuck; slot-missing error after restore
Current mitigation Set cdc_mode = 'trigger' after restore; let auto recreate slots
Documented in FAQ § “Operations”
Status IMPLEMENTED — auto-detect missing slot in health check; fall back to TRIGGER

Proposed fix:

  1. Short term (P1): Add a health_check() finding that detects “cdc_mode includes WAL but replication slot does not exist” and recommends the trigger fallback.
  2. Medium term: On scheduler startup, when a source’s cdc_phase is WAL but the slot is missing, automatically fall back to TRIGGER mode with a WARNING log and a NOTIFY.

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs): - test_wal_fallback_on_missing_slot (pre-existing) — full lifecycle test: creates WAL-mode ST, drops replication slot externally, waits for automatic fallback to TRIGGER, verifies data integrity after fallback. - test_ec34_check_cdc_health_detects_missing_slot — verifies check_cdc_health() surfaces replication_slot_missing alert immediately after slot drop, then confirms fallback to TRIGGER and post-fallback data integrity.


EC-35 — Managed cloud PostgreSQL restrictions

Field Value
Area Infrastructure — cloud compatibility
Tier P3
Impact Extension cannot be installed
Documented in FAQ § “Deployment”

Proposed fix: pg_trickle cannot control cloud provider policies. Keep the FAQ entry updated with a compatibility matrix (RDS, Cloud SQL, AlloyDB, Azure Flexible, Supabase, Neon). Note that trigger-based CDC avoids the most common restriction (wal_level = logical).


EC-36 — Managed cloud: shared_preload_libraries restriction

Field Value
Area Infrastructure — cloud compatibility
Tier P3
Impact Background worker + shared memory unavailable
Documented in FAQ § “Deployment”

Proposed fix: Investigate a “no-bgworker” mode where refresh_stream_table() is called by an external cron (e.g. pg_cron or an application timer). The scheduler would be optional when shared_preload_libraries is unavailable. See plans/infra/REPORT_EXTERNAL_PROCESS.md for the sidecar feasibility study. Long term (v2.0+).


Prioritised Recommendations

P0 — Must fix (data correctness)

# Edge Case Action Estimated Effort
1 ✅ EC-01: JOIN key change + right-side DELETE R₀ via EXCEPT ALL — implemented in diff_inner/left/full_join Done
2 ✅ EC-06: Keyless table duplicate rows WARNING at creation + has_keyless_source flag Done
3 EC-19: WAL + keyless without REPLICA IDENTITY FULL Reject at creation time with clear error 0.5 day

P1 — Should fix (operational surprise)

# Edge Case Action Estimated Effort
4 ✅ EC-11: Schedule < refresh duration scheduler_falling_behind NOTIFY alert at 80% threshold Done
5 ✅ EC-13: Diamond split-version reads Default 'atomic' + pg_sys::BeginInternalSubTransaction in scheduler Done
6 ✅ EC-15: SELECT * + DDL breakage WARNING at creation for SELECT * queries Done
7 EC-16: ALTER FUNCTION undetected pg_proc hash polling on refresh cycle 2 days
8 EC-18: auto CDC stuck in TRIGGER Rate-limited LOG explaining why; health_check finding 1 day
9 ✅ EC-25: TRUNCATE on stream table Event trigger blocks TRUNCATE on pgtrickle schema Done
10 ✅ EC-26: Direct DML on stream table Guard trigger detecting non-engine writes Done
11 EC-34: WAL slots lost after restore Auto-detect missing slot; fall back to TRIGGER + WARNING 1 day

P2 — Nice to have (usability gaps)

# Edge Case Action Estimated Effort
12 EC-03: Window functions in expressions CTE extraction in parser 3–5 days
13 EC-05: Foreign tables ✅ IMPLEMENTED — error msg + polling CDC
14 ✅ EC-09: IMMEDIATE unsupported constructs Done — recursive CTEs + TopK now supported Done
15 ✅ EC-17: DDL during active refresh FAQ documentation added Done
16 ✅ EC-20: CDC TRANSITIONING complexity check_cdc_transition_health() in scheduler.rs Done
17 ✅ EC-28: PgBouncer Documentation already exists Done
18 EC-32: ALL (subquery) Parser rewrite to NOT EXISTS (… EXCEPT …) 2–3 days

P3 — Accepted trade-offs (document, no code change)

✅ EC-02 (GUC implemented), EC-04, EC-07, EC-08, EC-10, EC-12, EC-14, ✅ EC-21/22/23 (docs done), EC-24, EC-27, EC-29, EC-30, EC-31, EC-35, EC-36.

These are either fundamental design trade-offs or have adequate existing mitigations. Keep documentation current and revisit if user demand surfaces.

P3 — Committed long-term implementation

# Edge Case Action Estimated Effort
19 EC-33: CORR, COVAR*, REGR* Auxiliary accumulator columns; Welford-based SQL delta templates 5–8 days

Implementation Order

Recommended sprint sequence (assuming 2-week sprints):

Sprint 1 — P0 corrections: ✅ DONE - ✅ EC-19: WAL + keyless rejection (done: has_keyless_source flag + WARNING) - ✅ EC-06: Keyless duplicate warning + has_keyless_source catalog column - ✅ EC-01: R₀ via EXCEPT ALL (implemented in diff_inner/left/full_join)

Sprint 2 — P1 operational safety: ✅ DONE - ✅ EC-25: Block TRUNCATE on stream tables (event trigger) - ✅ EC-26: Guard trigger for direct DML - ✅ EC-15: Warn on SELECT * (WARNING at creation) - ✅ EC-11: scheduler_falling_behind alert (NOTIFY at 80% threshold) - ✅ EC-13: Default diamond_consistency = 'atomic' + scheduler SAVEPOINT → pg_sys::BeginInternalSubTransaction fix

Sprint 3 — Remaining P1 + P2 starts: ✅ DONE (v0.6.0) - ✅ EC-18: Rate-limited LOG for stuck auto mode (1 day) - ✅ EC-34: Auto-detect missing WAL slot (1 day) - ✅ EC-16: pg_proc hash polling for function changes (2 days) - ✅ EC-03: Window function CTE extraction (3–5 days)

Sprint 4+ — P2 + P3 long-term: ✅ DONE (v0.6.0) - ✅ EC-05, EC-32: Foreign table change detection; ALL (subquery)NOT EXISTS (… EXCEPT …) rewrite - EC-33: Statistical aggregates via auxiliary accumulator columns - Documentation sweep for all remaining P3 items


Open Questions

The following questions are unresolved design or validation unknowns that must be answered before the associated fixes can be fully specified or committed to.


OQ-01 — EC-01: What is the correct new default for adaptive_full_threshold? ✅ RESOLVED

Resolution (2026-05-30): The short-term threshold adjustment is dropped as not viable. It is a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold, providing false safety. The primary fix is the R₀ via EXCEPT ALL approach (see EC-01). The adaptive_full_threshold GUC remains as-is for users who want it as an escape valve, but it is no longer positioned as an EC-01 mitigation path.


OQ-02 — EC-01: Is pre-image capture viable without a breaking change buffer schema change? ✅ RESOLVED

Resolution (2026-05-30): Schema changes are NOT needed. The CDC trigger (src/cdc.rs) already stores OLD."col""old_col" typed columns in the change buffer for every UPDATE and DELETE event. The old join key values are therefore already available. The question is now about when to use them.

The R₀ via EXCEPT ALL approach (chosen for Sprint 1) does not need old_col column access from the delta CTE at all — it reconstructs the pre-change right state from the live table + ΔR change set. The pre-image (old_col) approach is still viable as a future optimisation (avoids the live-table EXCEPT ALL scan) but is deferred to v1.x because it requires structural changes to DiffContext/DeltaSource to surface old_col values in the delta CTE output. Effort: 7–10 days when the time comes.


OQ-03 — EC-01: What is the overhead of the R₀ via EXCEPT ALL approach?

Updated analysis (2026-05-30):

Code archaeology shows the overhead is similar to what Part 2 already pays for L₀:

  • Part 1 already does a semi-join-filtered scan of R₁ (full right table, filtered by delta-left join keys). Adding R₀ = R₁ EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes adds one extra pass over the ΔR CTE (which is small — only changed rows). The live table is not scanned an extra time because R₁ is already in the query plan; the EXCEPT ALL just subtracts a small set from it.
  • The semi-join filter that limits R₁ also applies to R₀, so the set difference operates on already-filtered rows.
  • The correction term (Part 3) is unaffected by this change.

Expected overhead: < 10% additional latency on the delta query for typical workloads where ΔR is small relative to R. The concern about “one extra temp-table scan per join source” from the original write-up was over-estimated — the EXCEPT ALL operates on the already-materialized ΔR CTE, not the full right table.

Recommendation: Confirm with a micro-benchmark (1M right table, 0.1% change rate) before the Sprint 1 PR merges. Accept if overhead is under 30% of current DIFFERENTIAL latency; otherwise fall back to applying R₀ only when ΔR is non-empty (easy conditional).


OQ-04 — EC-03: Where exactly do window-function-in-expression rewrites fail?

The CTE extraction rewrite for EC-03 is described as covering “~80% of user demand”, with complex cases (nested window calls inside arithmetic) remaining unsupported.

Open questions: - Is SUM(ROW_NUMBER() OVER (…)) rejectable at parse time, or does it require a full semantic analysis? - Can NTILE(4) OVER (…) * 100 be rewritten to a CTE without changing query semantics? - What are the most common real-world patterns users hit? (Survey the GitHub issues / Discord.)

Recommendation: Before implementing the rewrite, define a formal boundary: “any expression where a window function appears as a direct child of the SELECT list is rewriteable; anything deeper is rejected with a clear error.” Enumerate the cases explicitly in a test file (tests/e2e_window_expression_tests.rs) before writing the rewrite code. This keeps the scope bounded.


OQ-05 — EC-06: Is ctid-based row identity viable for trigger-mode CDC?

The long-term fix for EC-06 proposes using ctid as a stable row identifier for keyless tables in trigger mode. ctid is the physical heap location and changes after VACUUM FULL / CLUSTER.

Open questions: - Does pgrx expose the ctid of OLD in a row-level trigger? (Need to verify via SPI_gettypeid or HeapTupleHeader.) - What happens if VACUUM FULL runs between the CDC trigger fire and the next refresh cycle? Would the stale ctid silently match the wrong row? - Is ctid reset to the same value when the same physical slot is reused? (Yes in PostgreSQL — ctid is reused after VACUUM, not unique across time.)

Recommendation: Do not pursue ctid-based identity. The reuse semantics make it unsafe as a durable row identifier. Instead, focus Sprint 1 effort on the count-based hash approach (OQ-05 is answered: ctid is off the table). If a unique row identity is required, recommend users add a DEFAULT gen_random_uuid() column.


OQ-06 — EC-08: Do statement-level transition tables actually batch multiple row changes?

The plan for EC-08 states that statement-level triggers with transition tables “provide batching naturally”. This is true for a single UPDATE that affects N rows — the transition table contains all N old/new rows and the IVM trigger fires once. But it is not true for N separate single-row UPDATE statements in a loop — each fires the trigger once.

Open questions: - What is the actual p50/p99 latency overhead per DML statement in IMMEDIATE mode on a stream table with a 3-way join? (No benchmark exists.) - Does pg_advisory_xact_lock inside the trigger add measurable contention under 100+ concurrent single-row updates?

Recommendation: Add benches/ivm_write_overhead.rs as a Criterion benchmark that measures per-statement latency for IMMEDIATE vs DIFFERENTIAL mode at 1, 10, 100 concurrent writers. Run it as part of the weekly benchmark schedule. Make the results visible in docs/BENCHMARK.md so users can make informed mode choices. This answers the question empirically rather than by assumption.


OQ-07 — EC-36: How much of the scheduler can be decoupled for cloud deployments?

The long-term fix for EC-36 proposes a “no-bgworker” mode where refresh_stream_table() is driven by pg_cron or an application timer. plans/infra/REPORT_EXTERNAL_PROCESS.md covers the sidecar feasibility research but reaches no conclusion.

Open questions: - The scheduler currently holds a pg_advisory_lock during refresh to prevent concurrent refreshes. Would a cron-driven caller need to acquire this lock explicitly? (Yes — this is straightforward.) - The scheduler tracks refresh history in pgt_refresh_history. Would a cron-driven caller need to insert rows there manually, or can this be encapsulated in refresh_stream_table()? (Likely yes — needs a small refresh.rs refactor.) - What happens to the background worker registration if pg_trickle is in shared_preload_libraries but pg_trickle.enabled = off? The worker starts but immediately sleeps — is this acceptable for cloud providers that bill per background worker slot?

Recommendation: The prerequisite for EC-36 is verifying that refresh_stream_table() already records to pgt_refresh_history (check src/refresh.rs). If it does, the cron-mode is already 80% implemented — users just need to disable the bgworker and call refresh_stream_table() on a schedule. Document this pattern in docs/FAQ.md under “Deployment on managed PostgreSQL” as an interim workaround before a formal no-bgworker mode is built.