PLAN: Edge Cases — Catalogue, Workarounds & Prioritised Remediation

PLAN: Edge Cases — Catalogue, Workarounds & Prioritised Remediation

Status: Proposed Target milestone: v0.3.0+ Last updated: 2026-06-05

Motivation

pg_trickle’s documentation (FAQ, SQL Reference, Architecture) describes 36 discrete edge cases and limitations. Some are fundamental trade-offs, others are fixable engineering gaps. This document:

Catalogues every known edge case in one place.
Proposes concrete workarounds or fixes for each.
Assigns a priority tier so the team can focus effort where it matters most.

Priority Tiers

Tier	Meaning	Guidance
P0	Data correctness risk — users can silently get wrong results	Fix before v1.0
P1	Operational surprise — confusing failure, silent non-behaviour	Fix before v1.0
P2	Usability gap — documented but painful; strong workaround exists	Schedule for a minor release
P3	Accepted trade-off — inherent to the design; document clearly	Keep documentation up to date

Edge Case Catalogue

EC-01 — JOIN key change + simultaneous right-side DELETE

Field	Value
Area	DIFFERENTIAL — JOIN delta
Tier	P0
Impact	Stale row remains in stream table until next full refresh
Current mitigation	Adaptive FULL fallback (`pg_trickle.adaptive_full_threshold`)
Documented in	FAQ § “Known edge cases”; SQL_REFERENCE § “Known Delta Computation Limitations”
Status	✅ IMPLEMENTED — Part 1 split (R₀ via EXCEPT ALL) in `diff_inner_join`, `diff_left_join`, `diff_full_join`

Root cause: The delta query reads current_right after all changes are applied. When the old join partner is deleted before the delta runs, the DELETE half of the join finds no partner and is silently dropped.

Proposed fix (chosen approach: R₀ via EXCEPT ALL — medium-term):

The short-term threshold adjustment is not viable: it trades query correctness for a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold.

Primary fix — R₀ via EXCEPT ALL (4–6 days):

The delta formula in src/dvm/operators/join.rs is:

ΔJ = (ΔQ ⋈ R₁) + (L₀ ⋈ ΔR)        [current code]

Part 2 already computes L₀ (pre-change left state) via EXCEPT ALL for Scan children (left_part2_source in diff_inner_join). The fix mirrors this symmetrically for Part 1’s right side — splitting Part 1 into:

Part 1a: ΔQ_inserts ⋈ R₁           (post-change right — unchanged)
Part 1b: ΔQ_deletes ⋈ R₀           (pre-change right via EXCEPT ALL)

R₀ is computed identically to the existing L₀: sql R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes

build_snapshot_sql() already exists in join_common.rs and is reused here. The correction term (Part 3) was added for the L₁ case in Part 2 and is unaffected — Part 2 stays unchanged.

Implementation plan: - Day 1–2: Add right_part1_source decision + EXCEPT ALL CTE analogous to use_l0 / left_part2_source. Mirror use_r0 logic for nested right children (no SemiJoin, ≤ 2 scan nodes → R₀; otherwise R₁). - Day 2–3: Split Part 1 SQL into two UNION ALL arms; update row ID hashing for Part 1b arm. - Day 4–5: Integration tests — co-delete scenario, UPDATE-then-delete, multi-cycle correctness. - Day 5–6: TPC-H regression suite (Q07 is the critical path).

Deferred — pre-image capture from change buffer:

The long-term approach (using old_col values already stored in the change buffer trigger to reconstruct R₀ from ΔR directly, skipping the live table scan) is deferred. The old_col values are already captured by the CDC trigger (OLD."col" → "old_col" for UPDATE/DELETE), but surfacing them through DiffContext/DeltaSource requires structural changes to the delta CTE column representation that carry higher risk. Estimated additional effort: 7–10 days. Schedule for v1.x when the EXCEPT ALL approach is already correct and benchmarked.

EC-02 — CUBE/ROLLUP expansion limit (> 64 branches)

Field	Value
Area	DIFFERENTIAL — grouping sets
Tier	P3
Impact	Creation error
Current mitigation	Use explicit `GROUPING SETS(…)`
Documented in	SQL_REFERENCE § “CUBE/ROLLUP Expansion Limit”
Status	✅ Done — `pg_trickle.max_grouping_set_branches` GUC (default 64, range 1–65536)

Proposed fix: ~~The 64-branch limit is a sensible guard. Keep it but make the limit configurable via a GUC~~ Implemented: PGS_MAX_GROUPING_SET_BRANCHES in src/config.rs; parser.rs reads GUC dynamically instead of const. Documented in CONFIGURATION.md.

EC-03 — Window functions inside expressions

Field	Value
Area	DIFFERENTIAL — window functions
Tier	P2
Impact	Creation error
Current mitigation	Keep window functions as top-level SELECT columns
Documented in	FAQ § “Window Functions”
Status	✅ IMPLEMENTED — `rewrite_nested_window_exprs()` subquery-lift in `src/dvm/parser.rs`; helper `collect_all_window_func_nodes()` for recursive window func harvesting; `deparse_select_window_clause()` for WINDOW clause passthrough; exported from `src/dvm/mod.rs`; wired in `src/api.rs` before DISTINCT ON rewrite

Implementation: nested-subquery lift (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 1.3):

Window functions nested inside expressions are lifted into a synthetic column in an inner subquery, then the outer SELECT applies the original expression with the window function replaced by the synthetic column alias:

-- Before (rejected):
SELECT id, ABS(ROW_NUMBER() OVER (ORDER BY score) - 5) AS adjusted_rank
FROM players

-- After rewrite (supported):
SELECT id, ABS("__pgt_wf_inner"."__pgt_wf_1" - 5) AS "adjusted_rank"
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY score) AS "__pgt_wf_1"
    FROM players
) "__pgt_wf_inner"

The subquery approach is preferred over the earlier CTE approach because a WITH clause produces a WithQuery node in the OpTree, requiring the DVM engine to handle CTEs wrapping window functions — an untested path. The subquery produces a nested Scan → Window → Project chain that the existing DVM path already handles correctly.

Test coverage: 9 E2E tests (e2e_window_tests.rs): - 6 creation-acceptance tests: test_window_in_case_expression_rejected, test_window_in_coalesce_rejected, test_window_in_arithmetic_rejected, test_window_in_cast_rejected, test_window_deeply_nested_rejected, test_top_level_window_still_works (regression). - 3 data-correctness tests: test_ec03_case_window_data_correctness — CASE + ROW_NUMBER() produces correct tier labels; test_ec03_arithmetic_window_data_correctness — ROW_NUMBER() * 10 verified with a FULL refresh after INSERT; test_ec03_coalesce_window_data_correctness — COALESCE(SUM() OVER, 0) produces correct partition sums.

Known limitation: ~~Window-in-expression queries are accepted in DIFFERENTIAL mode (creation succeeds), but differential refresh fails with a column st.* does not exist error.~~ Fixed (EC-03): AUTO mode now correctly falls back to FULL refresh with an informational message. Explicit DIFFERENTIAL mode emits a WARNING at creation time alerting the user that the pattern will fall back to full recomputation at refresh time. The DVM delta engine cannot currently resolve the st.* alias inside the inner subquery introduced by the rewrite. FULL refresh works correctly. Differential support for the rewritten form is deferred.

EC-04 — Non-monotone recursive CTEs rejected

Field	Value
Area	DIFFERENTIAL — recursive CTEs
Tier	P3
Impact	Creation error
Current mitigation	Restructure as non-recursive CTE; or use FULL mode
Documented in	FAQ § “Recursive CTEs”

Proposed fix: This is a fundamental limitation of semi-naive evaluation — non-monotone operators in the recursive term can “un-derive” rows, breaking the incremental fixpoint. No code fix proposed. Keep the clear error message and document the reasoning. Users who need non-monotone recursion should use FULL mode.

EC-05 — Foreign tables in DIFFERENTIAL mode

Field	Value
Area	DIFFERENTIAL — foreign data wrappers
Tier	P2
Impact	Creation error
Current mitigation	Use FULL mode
Documented in	SQL_REFERENCE § “Source Tables”
Status	✅ IMPLEMENTED — error message improved + polling CDC implemented

Proposed fix:

✅ Short term (Done): Improved the error message to suggest FULL mode explicitly and mention the postgres_fdw + IMPORT FOREIGN SCHEMA pattern.
✅ Medium term (Done): Polling-based CDC for foreign tables. Enabled via pg_trickle.foreign_table_polling = on GUC. Creates a snapshot table per foreign table source; before each differential refresh, computes EXCEPT ALL deltas and populates the standard change buffer. Snapshot is refreshed each cycle. Cleanup path drops snapshot tables when STs are removed.

EC-06 — Keyless tables with exact duplicate rows

Field	Value
Area	DIFFERENTIAL — row identity
Tier	P0
Impact	Phantom or missed deletes in stream table
Current mitigation	Add a primary key; add a synthetic unique column; or use FULL mode
Documented in	FAQ § “Keyless Tables”; SQL_REFERENCE § “Row Identity”
Status	✅ IMPLEMENTED — Full EC-06 fix: catalog `has_keyless_source` flag, non-unique index for keyless sources, net-counting delta SQL in scan.rs (decompose changes into atomic +1/-1 per content hash, expand via `generate_series`), counted DELETE (ROW_NUMBER matching) in refresh.rs and ivm.rs, plain INSERT (no ON CONFLICT) for keyless sources. WARNING at creation time also implemented.

Implementation (completed):

Short term (P0): Emit a WARNING at create_stream_table() time ✅ Done
Full fix (P0): ✅ Implemented via net-counting approach:
- Catalog: Added has_keyless_source BOOLEAN NOT NULL DEFAULT FALSE to pgtrickle.pgt_stream_tables (set at creation time).
- Index: Non-unique CREATE INDEX for keyless sources (instead of CREATE UNIQUE INDEX), allowing duplicate __pgt_row_id values.
- Delta SQL (scan.rs): For keyless tables, decomposes all change events into atomic +1 (INSERT) / -1 (DELETE) operations per content hash. UPDATEs are split into their DELETE-old + INSERT-new components. Net counts are computed via GROUP BY content_hash + SUM(delta_sign), then expanded to the correct number of D/I rows using generate_series.
- Apply (refresh.rs): For keyless sources, forces explicit DML path (cannot use MERGE with non-unique row_ids). Counted DELETE uses ROW_NUMBER matching to pair delta deletes 1:1 with stream table rows. Plain INSERT without NOT EXISTS / ON CONFLICT.
- Apply (ivm.rs): Same counted DELETE + plain INSERT for IMMEDIATE mode.
- Upgrade SQL: ALTER TABLE ... ADD COLUMN IF NOT EXISTS has_keyless_source

Bug fixes (post-implementation):

Guard trigger DELETE cancel: The DML guard trigger (EC-26) used RETURN NEW which is NULL for DELETE operations, silently cancelling managed-refresh DELETEs on the stream table. Fixed to use IF TG_OP = 'DELETE' THEN RETURN OLD; ELSE RETURN NEW; END IF;.
Keyless aggregate UPDATE no-op: The keyless UPDATE template was a no-op (SELECT 1 WHERE false), which is correct for keyless scans but wrong for aggregate queries on keyless sources (where 'I'-action rows update existing groups). Fixed to use the normal UPDATE template, which naturally matches 0 rows for scan-level keyless deltas.

Test coverage: 7 E2E tests (e2e_keyless_duplicate_tests.rs) — all passing. Covers: duplicate rows basic, delete-one-of-duplicates, update-one-of-duplicates, unique content multi-cycle DML, delete-all- then-reinsert, mixed DML stress, aggregate with duplicates.

EC-07 — IMMEDIATE mode write serialization

Field	Value
Area	IMMEDIATE mode — concurrency
Tier	P3
Impact	Reduced write throughput under high concurrency
Current mitigation	Use DIFFERENTIAL mode for high-throughput OLTP
Documented in	FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: This is an inherent trade-off of synchronous IVM — you cannot maintain consistency without serialization. The IvmLockMode already uses lighter pg_try_advisory_xact_lock for simple scan chains. No code fix proposed. Document the throughput characteristics in the performance guide and add a benchmark that quantifies the write-side overhead so users can make informed mode choices.

EC-08 — IMMEDIATE mode write amplification

Field	Value
Area	IMMEDIATE mode — latency
Tier	P3
Impact	Increased per-DML latency
Current mitigation	Use DIFFERENTIAL mode
Documented in	FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: Inherent trade-off. Potential optimisation: batch multiple trigger invocations within a single statement into one delta application (the current implementation already uses statement-level triggers with transition tables, which provides this batching naturally). Add a benchmark to benches/ that measures per-statement overhead for 1, 10, 100, 1000 affected rows.

EC-09 — IMMEDIATE mode unsupported constructs

Field	Value
Area	IMMEDIATE mode — SQL support
Tier	P2
Impact	Creation / alter error
Current mitigation	Use DIFFERENTIAL mode
Documented in	FAQ § “What SQL features are NOT supported in IMMEDIATE mode?”
Status	✅ Done — recursive CTEs + TopK now supported in IMMEDIATE mode

Proposed fix (recursive CTEs + TopK): ~~deferred to~~ implemented from PLAN_TRANSACTIONAL_IVM_PART_2.md Phase 5.

Both constructs are now implemented:

Recursive CTEs (G10 / Task 5.1): check_immediate_support() changed from rejection to warning about stack depth. Semi-naive evaluation proceeds with DeltaSource::TransitionTable.
TopK (G11 / Task 5.2): apply_topk_micro_refresh() in ivm.rs implements statement-level micro-refresh. Bounded by pg_trickle.ivm_topk_max_limit GUC (default 1000). (default 1000) to prevent inline recomputation latency spikes for large K.

For materialised view sources: use a polling-change-detection wrapper (same approach as EC-05 for foreign tables). No code fix proposed for this case.

EC-10 — IMMEDIATE mode: no throttling

Field	Value
Area	IMMEDIATE mode — back-pressure
Tier	P3
Impact	No rate limiting for high-frequency writes
Current mitigation	Use DIFFERENTIAL mode with a schedule interval
Documented in	FAQ § “IMMEDIATE Mode Trade-offs”

Proposed fix: By design — throttling IMMEDIATE would break the same-transaction consistency guarantee. No code fix. Ensure the FAQ clearly explains that DIFFERENTIAL with a short interval (e.g. '1s') is the right answer when back-pressure is needed.

EC-11 — Schedule interval shorter than refresh duration

Field	Value
Area	Scheduler — timing
Tier	P1
Impact	Perpetual latency spiral; cascading DAG chains fall behind
Current mitigation	Widen the interval; use IMMEDIATE mode; or split the DAG
Documented in	FAQ § “Performance and Tuning”
Status	✅ IMPLEMENTED — `scheduler_falling_behind` NOTIFY alert at 80% threshold

Test coverage: SchedulerFallingBehind variant covered in test_alert_event_as_str and test_alert_event_all_variants_unique (monitor.rs unit tests).

Chosen fix: scheduler_falling_behind NOTIFY alert (short term):

When the refresh duration exceeds 80% of the schedule interval for 3 consecutive cycles, emit a NOTIFY on the channel pgtrickle_alerts with a JSON payload containing the stream table name, average duration, and configured interval. Operators can LISTEN on the channel and wire it into their alerting stack.

Deferred: - auto_backoff GUC (double effective interval when falling behind) — medium term, schedule for a later sprint. - explain_st(name) recommendation from pgt_refresh_history — long term.

EC-12 — No read-your-writes (DIFFERENTIAL/FULL)

Field	Value
Area	Scheduler — consistency model
Tier	P3
Impact	Stale reads immediately after write
Current mitigation	Use IMMEDIATE mode or call `refresh_stream_table()` manually
Documented in	FAQ § “Consistency”

Proposed fix: Inherent to the scheduled refresh model. No code fix. Consider a convenience function pgtrickle.write_and_refresh(dml_sql, st_name) that executes the DML and triggers a manual refresh in a single call. Low priority — manual refresh_stream_table() already covers this.

EC-13 — Diamond dependency split-version reads

Field	Value
Area	Diamond dependencies — consistency
Tier	P1
Impact	Fan-in node reads inconsistent upstream versions
Current mitigation	`diamond_consistency = 'atomic'`
Documented in	FAQ § “Diamond Dependencies”; SQL_REFERENCE
Status	✅ FULLY IMPLEMENTED — default `'atomic'` + scheduler sub-transaction fix

Proposed fix: The atomic mode already fixes this. Remaining work:

Short term (P1): Default diamond_consistency to 'atomic' for new stream tables (currently 'none'). This is a safer default — users who don’t need atomicity can opt out. ✅ Done.
Medium term: Add a health_check() warning when a diamond graph exists but diamond_consistency = 'none'.

Scheduler bug fixed (follow-on): When EC-13 changed the default to 'atomic', a latent bug surfaced: the scheduler’s diamond group path used Spi::run("SAVEPOINT …") which PostgreSQL rejects in background workers (SPI_ERROR_TRANSACTION). This silently skipped every diamond group refresh. Fixed by replacing the SPI SAVEPOINT calls with pg_sys::BeginInternalSubTransaction / ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction, which bypass SPI. See src/scheduler.rs.

EC-14 — Diamond upstream failure skips downstream

Field	Value
Area	Diamond dependencies — failure handling
Tier	P3
Impact	Downstream stays stale for one cycle
Current mitigation	Automatic retry on next cycle
Documented in	FAQ § “Diamond Dependencies”

Proposed fix: Current behaviour is correct. Add a NOTIFY diamond_group_partial_failure when a group member fails so operators can investigate. No urgency — the automatic retry handles recovery.

EC-15 — SELECT * on source table + DDL

Field	Value
Area	DDL — schema evolution
Tier	P1
Impact	Refresh fails; stream table marked `needs_reinit`
Current mitigation	Use explicit column lists in defining queries
Documented in	FAQ § “Schema Changes”
Status	✅ IMPLEMENTED — WARNING at creation time for `SELECT *`

Test coverage: 10 unit tests for detect_select_star() pure function (api.rs): bare *, table-qualified t.*, explicit columns, count(*), sum(*), mixed agg+bare star, no asterisk, subquery star, case insensitivity, no-FROM clause.

Proposed fix:

Short term (P1): ✅ Done — Emits WARNING at create_stream_table() time when the defining query contains SELECT *. Detection logic extracted to pure detect_select_star() function for unit testability.
Medium term: When needs_reinit is triggered by a column-count mismatch, attempt an automatic reinitialisation: re-parse the query, verify the new column set is compatible, and refresh. If compatible, skip the needs_reinit flag and refresh directly.

EC-16 — ALTER FUNCTION body change not detected

Field	Value
Area	DDL — function dependency tracking
Tier	P1
Impact	Stream table silently uses new function logic without a full rebase
Current mitigation	Manual full refresh after function changes
Documented in	FAQ § “Schema Changes”
Status	✅ IMPLEMENTED — `function_hashes TEXT` column on `pgt_stream_tables` (migration 0.2.1→0.2.2); `check_proc_hashes_changed()` in `refresh.rs` queries `md5(prosrc \|\| coalesce(probin::text,''))` from `pg_proc` on each diff refresh; on mismatch calls `mark_for_reinitialize` + NOTICE; on first call stores baseline silently

Implemented fix:

Polling via pg_proc on each differential refresh: - On first call (stored hashes = NULL): compute and persist baseline. No reinit. - On following calls: recompute hashes. If any differ → mark_for_reinitialize + NOTICE. The next scheduler cycle performs a full refresh. - Uses md5(string_agg(prosrc || coalesce(probin::text,''), ',' ORDER BY oid)) to handle polymorphic overloads robustly.

Test coverage: 3 E2E tests (e2e_multi_cycle_tests.rs): - test_ec16_function_body_change_marks_reinit — verifies that CREATE OR REPLACE FUNCTION triggers needs_reinit = true on the next differential refresh. - test_ec16_function_change_full_refresh_recovery — verifies that after reinit, a full refresh produces correct results with the new function logic. - test_ec16_no_functions_unaffected — verifies that stream tables without user-defined functions are never affected by the hash polling mechanism.

EC-17 — Schema change during active refresh

Field	Value
Area	DDL — concurrency
Tier	P2
Impact	Refresh errors; stream table suspended
Documented in	FAQ § “Schema Changes”
Status	✅ Done — FAQ section documents lock interaction

Proposed fix: ~~No code fix needed~~ Implemented: FAQ section added explaining ShareLock on source tables during refresh blocks concurrent ALTER TABLE (AccessExclusiveLock). The window is only open if the DDL sneaks in between lock acquisition and first read.

EC-18 — `auto` CDC mode silent non-upgrade

Field	Value
Area	CDC — mode transition
Tier	P1
Impact	User thinks WAL CDC is active; actually stuck on triggers forever
Current mitigation	Default is `TRIGGER` mode; `auto` must be explicitly set
Documented in	FAQ § “CDC Architecture”
Status	✅ IMPLEMENTED — rate-limited LOG every ~60 scheduler ticks

Proposed fix:

Short term (P1): When cdc_mode = 'auto', emit a LOG-level message on every scheduler cycle that explains why the transition hasn’t happened (e.g. “wal_level is ‘replica’, need ‘logical’”). Rate- limit to once per 5 minutes.
Medium term: Add a check_cdc_health() finding for “auto mode stuck in TRIGGER phase for > 1 hour”.

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs) + 1 existing test: - test_ec18_check_cdc_health_shows_trigger_for_stuck_auto — verifies check_cdc_health() reports TRIGGER mode for a keyless auto-CDC source and no alert fires for healthy TRIGGER-mode sources. - test_ec18_health_check_ok_with_trigger_auto_sources — verifies health_check() does not flag TRIGGER-mode auto-CDC sources as errors. - test_wal_keyless_table_stays_on_triggers (pre-existing) — verifies keyless tables remain on TRIGGER mode in auto-CDC configuration.

EC-19 — WAL mode + keyless tables need REPLICA IDENTITY FULL

Field	Value
Area	CDC — WAL decoding
Tier	P0
Impact	Silent data loss for UPDATEs/DELETEs on keyless tables in WAL mode
Current mitigation	Set `REPLICA IDENTITY FULL` before switching to WAL CDC
Documented in	FAQ § “CDC Architecture”
Status	✅ IMPLEMENTED — `setup_cdc_for_source()` rejects at creation time

Proposed fix:

Short term (P0): At create_stream_table() / alter_stream_table() time, when cdc_mode involves WAL and any source table is keyless, check pg_class.relreplident and reject with a clear error if not 'f' (full). This prevents the silent data loss entirely.
Medium term: Automatically issue ALTER TABLE … REPLICA IDENTITY FULL for keyless tables when entering WAL mode (with a NOTICE).

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs): - test_ec19_wal_keyless_without_replica_identity_full_rejected — verifies that create_stream_table(cdc_mode => 'wal') on a keyless table without REPLICA IDENTITY FULL is rejected with a clear error. - test_ec19_wal_keyless_with_replica_identity_full_accepted — verifies that the same combination succeeds once REPLICA IDENTITY FULL is set.

EC-20 — CDC TRANSITIONING phase complexity

Field	Value
Area	CDC — mode transition
Tier	P2
Impact	Potential duplicate capture during transition; rollback to triggers on failure
Current mitigation	LSN-based deduplication; automatic rollback
Documented in	FAQ § “CDC Architecture”
Status	✅ Done — `check_cdc_transition_health()` in scheduler.rs

Proposed fix: ~~Add a trigger_inventory() + check_cdc_health() combined check~~ Implemented: check_cdc_transition_health() runs after recover_from_crash() in the scheduler. Loads all stream table deps, filters TRANSITIONING entries, checks pg_replication_slots for slot existence, rolls back to TRIGGER mode if slot is missing, logs status.

EC-21 — Stream tables not propagated to logical replication subscriber

Field	Value
Area	Replication — architecture
Tier	P3
Impact	Subscriber gets static snapshot without refresh capability
Current mitigation	Run pg_trickle only on primary
Documented in	FAQ § “Replication”
Status	✅ Done — FAQ documentation sweep completed

Proposed fix: Inherent to the architecture — CDC triggers and the scheduler only run on the primary. No code fix. Consider a future “subscriber mode” that detects replicated source tables and runs its own CDC + refresh cycle, but this is a major undertaking (v2.0+).

EC-22 — Change buffers not published to subscribers

Field	Value
Area	Replication — change buffers
Tier	P3
Impact	Subscriber cannot drive its own refresh
Documented in	FAQ § “Replication”
Status	✅ Done — FAQ documentation sweep completed

Proposed fix: By design. Same as EC-21 — keep documentation clear.

EC-23 — Scheduler does not run on standby / replica

Field	Value
Area	Replication — standby
Tier	P3
Impact	No IVM on read replicas
Current mitigation	Query the primary; or promote the replica
Documented in	FAQ § “HA / Replication”
Status	✅ Done — FAQ documentation sweep completed

Proposed fix: By design — standbys are read-only. No code fix. A future enhancement could add a read-only health endpoint on standby that reports the last refresh timestamp from the replicated catalog, so monitoring can detect staleness.

EC-24 — TRUNCATE on source table bypasses CDC triggers

Field	Value
Area	CDC — TRUNCATE handling
Tier	P3
Impact	Stream table not updated (until reinit on next cycle)
Current mitigation	TRUNCATE event trigger marks `needs_reinit`; next cycle does full recompute
Documented in	SQL_REFERENCE § “TRUNCATE”

Proposed fix: Already handled by the TRUNCATE event trigger. The only gap is latency — between the TRUNCATE and the next cycle, the stream table is stale. For IMMEDIATE mode, TRUNCATE triggers a same-transaction full refresh (already implemented). For DIFFERENTIAL/FULL, the current behaviour is acceptable. No further code fix needed.

EC-25 — TRUNCATE on a stream table itself

Field	Value
Area	Stream table — direct manipulation
Tier	P1
Impact	Frontier / buffer desync; future refreshes produce incorrect deltas
Current mitigation	Use `refresh_stream_table()` to reset
Documented in	SQL_REFERENCE § “What Is NOT Allowed”
Status	✅ IMPLEMENTED — BEFORE TRUNCATE guard trigger blocks direct TRUNCATE

Test coverage: test_guard_trigger_blocks_truncate E2E test (e2e_guard_trigger_tests.rs).

Proposed fix:

Short term (P1): ✅ Done — BEFORE TRUNCATE trigger installed on storage tables at creation time. Checks pg_trickle.internal_refresh session variable to allow managed refreshes. The internal_refresh flag is set in all refresh code paths: execute_manual_refresh, execute_manual_full_refresh, execute_full_refresh, execute_topk_refresh, and pgt_ivm_handle_truncate.
Medium term: Instead of blocking, intercept TRUNCATE and automatically perform a full refresh (same approach as IMMEDIATE mode TRUNCATE handling). This is friendlier but may surprise users who expect TRUNCATE to be instant.

EC-26 — Direct DML on stream table

Field	Value
Area	Stream table — direct manipulation
Tier	P1
Impact	Data overwritten or duplicated on next refresh
Current mitigation	Documentation; never write directly
Documented in	SQL_REFERENCE § “What Is NOT Allowed”
Status	✅ IMPLEMENTED — BEFORE INSERT/UPDATE/DELETE guard trigger on storage table

Test coverage: 4 E2E tests (e2e_guard_trigger_tests.rs) — test_guard_trigger_blocks_direct_insert, _update, _delete + test_guard_trigger_allows_managed_refresh.

Proposed fix:

Short term (P1): ✅ Done — Row-level BEFORE trigger installed on storage tables at creation time. Uses pg_trickle.internal_refresh session variable (checked via current_setting(..., true)) to allow managed refreshes. Returns OLD for DELETE operations (returning NEW would be NULL, silently cancelling the DELETE). The internal_refresh flag is set across all refresh code paths.
Medium term: Use a PostgreSQL row-level security policy or a pg_trickle.allow_direct_dml session variable as an escape hatch for power users.

EC-27 — Foreign keys on stream tables

Field	Value
Area	Stream table — constraints
Tier	P3
Impact	FK violations during bulk MERGE
Current mitigation	Do not define FKs
Documented in	SQL_REFERENCE § “What Is NOT Allowed”

Proposed fix: Reject ALTER TABLE … ADD CONSTRAINT … FOREIGN KEY on stream tables via event trigger (same pattern as EC-25/EC-26). P3 — the current documentation is sufficient, and FK creation on stream tables is rare.

EC-28 — PgBouncer transaction pooling

Field	Value
Area	Infrastructure — connection pooling
Tier	P2
Impact	Prepared statement errors, lock escapes, GUC not applied
Current mitigation	Use session-mode pooling; use `SET LOCAL`
Documented in	FAQ § “PgBouncer”
Status	✅ Done — documentation already exists

Proposed fix: pg_trickle’s scheduler uses a single dedicated backend connection (not pooled), so the scheduler itself is unaffected. The edge case applies to application queries that use SET pg_trickle.* GUCs via a pooled connection. No code fix — this is a PgBouncer limitation. Improve documentation with a recommended PgBouncer configuration snippet.

EC-29 — TopK + set operations / GROUPING SETS

Field	Value
Area	TopK — SQL restrictions
Tier	P3
Impact	Creation error
Documented in	SQL_REFERENCE § “TopK”

Proposed fix: TopK uses scoped recomputation via MERGE, which fundamentally requires a single ORDER BY … LIMIT at the top level. Set operations and multi-level grouping sets break this invariant. No code fix. Keep the clear rejection error and document the workaround (split into separate stream tables).

EC-30 — Circular dependencies

Field	Value
Area	DAG — cycle detection
Tier	P3
Impact	Creation error with cycle path in message
Documented in	FAQ § “Dependencies”; SQL_REFERENCE

Proposed fix: Already handled — rejected at creation time with a clear error listing the cycle path. See plans/sql/PLAN_CIRCULAR_REFERENCES.md for future exploration of safe fixed-point evaluation. No further work.

Field	Value
Area	SQL — row locking
Tier	P3
Impact	Creation error
Documented in	SQL_REFERENCE § “Rejected Constructs”

Proposed fix: By design — row locks serve no purpose in stream table defining queries. No code fix. The error message is clear.

EC-32 — ALL (subquery) not supported

Field	Value
Area	DIFFERENTIAL — subquery operators
Tier	P2
Impact	Creation error
Documented in	FAQ § “Why Are These SQL Features Not Supported?”
Status	✅ IMPLEMENTED — `parse_all_sublink()` in `src/dvm/parser.rs` updated with NULL-safe anti-join condition `(col IS NULL OR NOT (x op col))`; full AntiJoin pipeline was already wired (`parse_sublink_to_wrapper` → `parse_all_sublink` → `AntiJoin SublinkWrapper` → `diff_anti_join`)

Fix applied: NULL-safe anti-join condition.

ALL (subquery) is mathematically a conjunction of per-row comparisons. The full pipeline through parse_all_sublink() → AntiJoin OpTree → diff_anti_join was already wired. The outstanding issue was NULL-safety: the previous condition NOT (x op col) evaluated to NULL (not TRUE) when col IS NULL, so NULL rows in the subquery were silently ignored, giving wrong results. Fixed by changing the condition to (col IS NULL OR NOT (x op col)), which correctly excludes the outer row whenever any subquery row is NULL or fails the comparison.

Test coverage: 8 E2E tests (e2e_all_subquery_tests.rs): - test_all_subquery_less_than_differential — basic price < ALL filter matching the SQL Reference worked example. - test_all_subquery_differential_inner_insert — differential refresh after inserting a cheaper competitor price that disqualifies a product. - test_all_subquery_differential_outer_change — outer INSERT + inner DELETE with recomputation. - test_all_subquery_null_in_inner — NULL-safety: ALL returns false when subquery contains NULL; removing NULL re-qualifies rows. - test_all_subquery_empty_inner — empty subquery: ALL against empty set yields true (SQL standard); inserting reduces result. - test_all_subquery_full_refresh — >= ALL with FULL refresh mode. - test_all_subquery_equals_operator — = ALL (value equals every row). - test_all_subquery_not_equals_operator — <> ALL (equivalent to NOT IN).

EC-33 — Statistical aggregates (CORR, COVAR, REGR)

Field	Value
Area	DIFFERENTIAL — aggregates
Tier	P3
Impact	Creation error in DIFFERENTIAL mode
Current mitigation	Use FULL mode; or pre-aggregate in DIFFERENTIAL and compute in a view
Documented in	FAQ § “Unsupported Aggregates”

Chosen fix: group-rescan (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 4.1):

Implement CORR/COVAR_POP/COVAR_SAMP/REGR_* using the proven group-rescan strategy already used for BOOL_AND, STRING_AGG, etc.:

Detect affected groups from the delta (groups where any row was inserted, updated, or deleted).
For each affected group, re-aggregate from source: SELECT group_key, CORR(y, x) FROM source GROUP BY group_key
Apply the result as an UPDATE to the stream table.

This requires no stream table schema changes and no new OpTree variants.

Deferred — Welford auxiliary accumulator columns:

The earlier proposal of maintaining sum_x, sum_y, sum_xy, sum_x2, count accumulator columns in the stream table using Welford’s online algorithm adapted to SQL EXCEPT/UNION was explored but is deferred. Rationale: it requires a breaking schema change to existing stream tables, the SQL delta-algebra correctness of the Welford adaptation is non-trivial to verify, and group-rescan is already correct and benchmarked for analogue aggregates. Revisit for v2.0 if group-rescan performance is insufficient for very large groups.

EC-34 — Backup / restore loses WAL replication slots

Field	Value
Area	Operations — disaster recovery
Tier	P1
Impact	CDC stuck; slot-missing error after restore
Current mitigation	Set `cdc_mode = 'trigger'` after restore; let auto recreate slots
Documented in	FAQ § “Operations”
Status	✅ IMPLEMENTED — auto-detect missing slot in health check; fall back to TRIGGER

Proposed fix:

Short term (P1): Add a health_check() finding that detects “cdc_mode includes WAL but replication slot does not exist” and recommends the trigger fallback.
Medium term: On scheduler startup, when a source’s cdc_phase is WAL but the slot is missing, automatically fall back to TRIGGER mode with a WARNING log and a NOTIFY.

Test coverage: 2 E2E tests (e2e_wal_cdc_tests.rs): - test_wal_fallback_on_missing_slot (pre-existing) — full lifecycle test: creates WAL-mode ST, drops replication slot externally, waits for automatic fallback to TRIGGER, verifies data integrity after fallback. - test_ec34_check_cdc_health_detects_missing_slot — verifies check_cdc_health() surfaces replication_slot_missing alert immediately after slot drop, then confirms fallback to TRIGGER and post-fallback data integrity.

EC-35 — Managed cloud PostgreSQL restrictions

Field	Value
Area	Infrastructure — cloud compatibility
Tier	P3
Impact	Extension cannot be installed
Documented in	FAQ § “Deployment”

Proposed fix: pg_trickle cannot control cloud provider policies. Keep the FAQ entry updated with a compatibility matrix (RDS, Cloud SQL, AlloyDB, Azure Flexible, Supabase, Neon). Note that trigger-based CDC avoids the most common restriction (wal_level = logical).

EC-36 — Managed cloud: shared_preload_libraries restriction

Field	Value
Area	Infrastructure — cloud compatibility
Tier	P3
Impact	Background worker + shared memory unavailable
Documented in	FAQ § “Deployment”

Proposed fix: Investigate a “no-bgworker” mode where refresh_stream_table() is called by an external cron (e.g. pg_cron or an application timer). The scheduler would be optional when shared_preload_libraries is unavailable. See plans/infra/REPORT_EXTERNAL_PROCESS.md for the sidecar feasibility study. Long term (v2.0+).

Prioritised Recommendations

P0 — Must fix (data correctness)

#	Edge Case	Action	Estimated Effort
1	✅ EC-01: JOIN key change + right-side DELETE	R₀ via EXCEPT ALL — implemented in `diff_inner/left/full_join`	Done
2	✅ EC-06: Keyless table duplicate rows	WARNING at creation + `has_keyless_source` flag	Done
3	EC-19: WAL + keyless without REPLICA IDENTITY FULL	Reject at creation time with clear error	0.5 day

P1 — Should fix (operational surprise)

#	Edge Case	Action	Estimated Effort
4	✅ EC-11: Schedule < refresh duration	`scheduler_falling_behind` NOTIFY alert at 80% threshold	Done
5	✅ EC-13: Diamond split-version reads	Default `'atomic'` + `pg_sys::BeginInternalSubTransaction` in scheduler	Done
6	✅ EC-15: SELECT * + DDL breakage	WARNING at creation for `SELECT *` queries	Done
7	EC-16: ALTER FUNCTION undetected	`pg_proc` hash polling on refresh cycle	2 days
8	EC-18: `auto` CDC stuck in TRIGGER	Rate-limited LOG explaining why; health_check finding	1 day
9	✅ EC-25: TRUNCATE on stream table	Event trigger blocks TRUNCATE on pgtrickle schema	Done
10	✅ EC-26: Direct DML on stream table	Guard trigger detecting non-engine writes	Done
11	EC-34: WAL slots lost after restore	Auto-detect missing slot; fall back to TRIGGER + WARNING	1 day

P2 — Nice to have (usability gaps)

#	Edge Case	Action	Estimated Effort
12	EC-03: Window functions in expressions	CTE extraction in parser	3–5 days
13	EC-05: Foreign tables	✅ IMPLEMENTED — error msg + polling CDC	—
14	✅ EC-09: IMMEDIATE unsupported constructs	Done — recursive CTEs + TopK now supported	Done
15	✅ EC-17: DDL during active refresh	FAQ documentation added	Done
16	✅ EC-20: CDC TRANSITIONING complexity	`check_cdc_transition_health()` in scheduler.rs	Done
17	✅ EC-28: PgBouncer	Documentation already exists	Done
18	EC-32: ALL (subquery)	Parser rewrite to NOT EXISTS (… EXCEPT …)	2–3 days

P3 — Accepted trade-offs (document, no code change)

✅ EC-02 (GUC implemented), EC-04, EC-07, EC-08, EC-10, EC-12, EC-14, ✅ EC-21/22/23 (docs done), EC-24, EC-27, EC-29, EC-30, EC-31, EC-35, EC-36.

These are either fundamental design trade-offs or have adequate existing mitigations. Keep documentation current and revisit if user demand surfaces.

P3 — Committed long-term implementation

#	Edge Case	Action	Estimated Effort
19	EC-33: CORR, COVAR, REGR	Auxiliary accumulator columns; Welford-based SQL delta templates	5–8 days

Implementation Order

Recommended sprint sequence (assuming 2-week sprints):

Sprint 1 — P0 corrections: ✅ DONE - ✅ EC-19: WAL + keyless rejection (done: has_keyless_source flag + WARNING) - ✅ EC-06: Keyless duplicate warning + has_keyless_source catalog column - ✅ EC-01: R₀ via EXCEPT ALL (implemented in diff_inner/left/full_join)

Sprint 2 — P1 operational safety: ✅ DONE - ✅ EC-25: Block TRUNCATE on stream tables (event trigger) - ✅ EC-26: Guard trigger for direct DML - ✅ EC-15: Warn on SELECT * (WARNING at creation) - ✅ EC-11: scheduler_falling_behind alert (NOTIFY at 80% threshold) - ✅ EC-13: Default diamond_consistency = 'atomic' + scheduler SAVEPOINT → pg_sys::BeginInternalSubTransaction fix

Sprint 3 — Remaining P1 + P2 starts: ✅ DONE (v0.6.0) - ✅ EC-18: Rate-limited LOG for stuck auto mode (1 day) - ✅ EC-34: Auto-detect missing WAL slot (1 day) - ✅ EC-16: pg_proc hash polling for function changes (2 days) - ✅ EC-03: Window function CTE extraction (3–5 days)

Sprint 4+ — P2 + P3 long-term: ✅ DONE (v0.6.0) - ✅ EC-05, EC-32: Foreign table change detection; ALL (subquery) → NOT EXISTS (… EXCEPT …) rewrite - EC-33: Statistical aggregates via auxiliary accumulator columns - Documentation sweep for all remaining P3 items

Open Questions

The following questions are unresolved design or validation unknowns that must be answered before the associated fixes can be fully specified or committed to.

OQ-01 — EC-01: What is the correct new default for `adaptive_full_threshold`? ✅ RESOLVED

Resolution (2026-05-30): The short-term threshold adjustment is dropped as not viable. It is a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold, providing false safety. The primary fix is the R₀ via EXCEPT ALL approach (see EC-01). The adaptive_full_threshold GUC remains as-is for users who want it as an escape valve, but it is no longer positioned as an EC-01 mitigation path.

OQ-02 — EC-01: Is pre-image capture viable without a breaking change buffer schema change? ✅ RESOLVED

Resolution (2026-05-30): Schema changes are NOT needed. The CDC trigger (src/cdc.rs) already stores OLD."col" → "old_col" typed columns in the change buffer for every UPDATE and DELETE event. The old join key values are therefore already available. The question is now about when to use them.

The R₀ via EXCEPT ALL approach (chosen for Sprint 1) does not need old_col column access from the delta CTE at all — it reconstructs the pre-change right state from the live table + ΔR change set. The pre-image (old_col) approach is still viable as a future optimisation (avoids the live-table EXCEPT ALL scan) but is deferred to v1.x because it requires structural changes to DiffContext/DeltaSource to surface old_col values in the delta CTE output. Effort: 7–10 days when the time comes.

OQ-03 — EC-01: What is the overhead of the R₀ via EXCEPT ALL approach?

Updated analysis (2026-05-30):

Code archaeology shows the overhead is similar to what Part 2 already pays for L₀:

Part 1 already does a semi-join-filtered scan of R₁ (full right table, filtered by delta-left join keys). Adding R₀ = R₁ EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes adds one extra pass over the ΔR CTE (which is small — only changed rows). The live table is not scanned an extra time because R₁ is already in the query plan; the EXCEPT ALL just subtracts a small set from it.
The semi-join filter that limits R₁ also applies to R₀, so the set difference operates on already-filtered rows.
The correction term (Part 3) is unaffected by this change.

Expected overhead: < 10% additional latency on the delta query for typical workloads where ΔR is small relative to R. The concern about “one extra temp-table scan per join source” from the original write-up was over-estimated — the EXCEPT ALL operates on the already-materialized ΔR CTE, not the full right table.

Recommendation: Confirm with a micro-benchmark (1M right table, 0.1% change rate) before the Sprint 1 PR merges. Accept if overhead is under 30% of current DIFFERENTIAL latency; otherwise fall back to applying R₀ only when ΔR is non-empty (easy conditional).

OQ-04 — EC-03: Where exactly do window-function-in-expression rewrites fail?

The CTE extraction rewrite for EC-03 is described as covering “~80% of user demand”, with complex cases (nested window calls inside arithmetic) remaining unsupported.

Open questions: - Is SUM(ROW_NUMBER() OVER (…)) rejectable at parse time, or does it require a full semantic analysis? - Can NTILE(4) OVER (…) * 100 be rewritten to a CTE without changing query semantics? - What are the most common real-world patterns users hit? (Survey the GitHub issues / Discord.)

Recommendation: Before implementing the rewrite, define a formal boundary: “any expression where a window function appears as a direct child of the SELECT list is rewriteable; anything deeper is rejected with a clear error.” Enumerate the cases explicitly in a test file (tests/e2e_window_expression_tests.rs) before writing the rewrite code. This keeps the scope bounded.

OQ-05 — EC-06: Is `ctid`-based row identity viable for trigger-mode CDC?

The long-term fix for EC-06 proposes using ctid as a stable row identifier for keyless tables in trigger mode. ctid is the physical heap location and changes after VACUUM FULL / CLUSTER.

Open questions: - Does pgrx expose the ctid of OLD in a row-level trigger? (Need to verify via SPI_gettypeid or HeapTupleHeader.) - What happens if VACUUM FULL runs between the CDC trigger fire and the next refresh cycle? Would the stale ctid silently match the wrong row? - Is ctid reset to the same value when the same physical slot is reused? (Yes in PostgreSQL — ctid is reused after VACUUM, not unique across time.)

Recommendation: Do not pursue ctid-based identity. The reuse semantics make it unsafe as a durable row identifier. Instead, focus Sprint 1 effort on the count-based hash approach (OQ-05 is answered: ctid is off the table). If a unique row identity is required, recommend users add a DEFAULT gen_random_uuid() column.

OQ-06 — EC-08: Do statement-level transition tables actually batch multiple row changes?

The plan for EC-08 states that statement-level triggers with transition tables “provide batching naturally”. This is true for a single UPDATE that affects N rows — the transition table contains all N old/new rows and the IVM trigger fires once. But it is not true for N separate single-row UPDATE statements in a loop — each fires the trigger once.

Open questions: - What is the actual p50/p99 latency overhead per DML statement in IMMEDIATE mode on a stream table with a 3-way join? (No benchmark exists.) - Does pg_advisory_xact_lock inside the trigger add measurable contention under 100+ concurrent single-row updates?

Recommendation: Add benches/ivm_write_overhead.rs as a Criterion benchmark that measures per-statement latency for IMMEDIATE vs DIFFERENTIAL mode at 1, 10, 100 concurrent writers. Run it as part of the weekly benchmark schedule. Make the results visible in docs/BENCHMARK.md so users can make informed mode choices. This answers the question empirically rather than by assumption.

OQ-07 — EC-36: How much of the scheduler can be decoupled for cloud deployments?

The long-term fix for EC-36 proposes a “no-bgworker” mode where refresh_stream_table() is driven by pg_cron or an application timer. plans/infra/REPORT_EXTERNAL_PROCESS.md covers the sidecar feasibility research but reaches no conclusion.

Open questions: - The scheduler currently holds a pg_advisory_lock during refresh to prevent concurrent refreshes. Would a cron-driven caller need to acquire this lock explicitly? (Yes — this is straightforward.) - The scheduler tracks refresh history in pgt_refresh_history. Would a cron-driven caller need to insert rows there manually, or can this be encapsulated in refresh_stream_table()? (Likely yes — needs a small refresh.rs refactor.) - What happens to the background worker registration if pg_trickle is in shared_preload_libraries but pg_trickle.enabled = off? The worker starts but immediately sleeps — is this acceptable for cloud providers that bill per background worker slot?

Recommendation: The prerequisite for EC-36 is verifying that refresh_stream_table() already records to pgt_refresh_history (check src/refresh.rs). If it does, the cron-mode is already 80% implemented — users just need to disable the bgworker and call refresh_stream_table() on a schedule. Document this pattern in docs/FAQ.md under “Deployment on managed PostgreSQL” as an interim workaround before a formal no-bgworker mode is built.

PGXN

PostgreSQL Extension Network

Contents

PLAN: Edge Cases — Catalogue, Workarounds & Prioritised Remediation

Motivation

Priority Tiers

Edge Case Catalogue

EC-01 — JOIN key change + simultaneous right-side DELETE

EC-02 — CUBE/ROLLUP expansion limit (> 64 branches)

EC-03 — Window functions inside expressions

EC-04 — Non-monotone recursive CTEs rejected

EC-05 — Foreign tables in DIFFERENTIAL mode

EC-06 — Keyless tables with exact duplicate rows

EC-07 — IMMEDIATE mode write serialization

EC-08 — IMMEDIATE mode write amplification

EC-09 — IMMEDIATE mode unsupported constructs

EC-10 — IMMEDIATE mode: no throttling

EC-11 — Schedule interval shorter than refresh duration

EC-12 — No read-your-writes (DIFFERENTIAL/FULL)

EC-13 — Diamond dependency split-version reads

EC-14 — Diamond upstream failure skips downstream

EC-15 — SELECT * on source table + DDL

EC-16 — ALTER FUNCTION body change not detected

EC-17 — Schema change during active refresh

EC-18 — auto CDC mode silent non-upgrade

EC-19 — WAL mode + keyless tables need REPLICA IDENTITY FULL

EC-20 — CDC TRANSITIONING phase complexity

EC-21 — Stream tables not propagated to logical replication subscriber

EC-22 — Change buffers not published to subscribers

EC-23 — Scheduler does not run on standby / replica

EC-24 — TRUNCATE on source table bypasses CDC triggers

EC-25 — TRUNCATE on a stream table itself

EC-26 — Direct DML on stream table

EC-27 — Foreign keys on stream tables

EC-28 — PgBouncer transaction pooling

EC-29 — TopK + set operations / GROUPING SETS

EC-30 — Circular dependencies

EC-31 — FOR UPDATE / FOR SHARE in defining query

EC-32 — ALL (subquery) not supported

EC-33 — Statistical aggregates (CORR, COVAR*, REGR*)

EC-34 — Backup / restore loses WAL replication slots

EC-35 — Managed cloud PostgreSQL restrictions

EC-36 — Managed cloud: shared_preload_libraries restriction

Prioritised Recommendations

P0 — Must fix (data correctness)

P1 — Should fix (operational surprise)

P2 — Nice to have (usability gaps)

P3 — Accepted trade-offs (document, no code change)

P3 — Committed long-term implementation

Implementation Order

Open Questions

OQ-01 — EC-01: What is the correct new default for adaptive_full_threshold? ✅ RESOLVED

OQ-02 — EC-01: Is pre-image capture viable without a breaking change buffer schema change? ✅ RESOLVED

OQ-03 — EC-01: What is the overhead of the R₀ via EXCEPT ALL approach?

OQ-04 — EC-03: Where exactly do window-function-in-expression rewrites fail?

OQ-05 — EC-06: Is ctid-based row identity viable for trigger-mode CDC?

OQ-06 — EC-08: Do statement-level transition tables actually batch multiple row changes?

OQ-07 — EC-36: How much of the scheduler can be decoupled for cloud deployments?

EC-18 — `auto` CDC mode silent non-upgrade

EC-33 — Statistical aggregates (CORR, COVAR, REGR)

OQ-01 — EC-01: What is the correct new default for `adaptive_full_threshold`? ✅ RESOLVED

OQ-05 — EC-06: Is `ctid`-based row identity viable for trigger-mode CDC?