PLAN: Test Pyramid Rebalance

PLAN: Test Pyramid Rebalance

Status: Complete
Date: 2026-03-05
Branch: plan-test-pyramid-rebalance
Scope: Shift test coverage down the pyramid — extract pure-logic unit tests from SPI-heavy modules, introduce a light-E2E tier to eliminate the 20-minute Docker build for most tests, and promote validation/error-path tests to unit level.

Progress Summary

Phase	Status	New Unit Tests	Notes
P1 — Test already-pure functions	Done	27	ivm, cdc
P2-A — IVM SQL builders	Done	10	Extracted `build_ivm_delete_sql`, `build_ivm_insert_sql`, `build_column_lists`
P2-B — Monitor tree renderer	Done	8	Extracted `render_dependency_tree` + `dfs`
P2-C — DDL event classification	Done	12	Extracted `classify_ddl_event` enum + `compare_snapshot_with_current`
P2-D — Scheduler decisions	Done	11	Extracted `is_group_due_pure` + `is_falling_behind`
P2-E — Alert payloads	Done	4	Extracted `build_alert_payload`
P2-F — CDC column lists	Done	4	Extracted `build_typed_col_defs`
P2-A4 — IVM trigger names	Done	9	Extracted `IvmTriggerNames` struct
P3 — Light-E2E tier	Done	—	Harness + CI + justfile targets
P4 — Validation tests to unit	Done	55	Revised scope (see below)

Total new unit tests: 133 (1,040 → 1,173)

P3 Implementation

The light-E2E tier uses a bind-mount + exec approach:

cargo pgrx package produces compiled extension artifacts in target/release/pg_trickle-pg18/.
A stock postgres:18.1 container starts with the artifacts bind-mounted to /tmp/pg_ext.
An exec copies the files to the PostgreSQL extension directories.
CREATE EXTENSION pg_trickle loads the extension on-demand.

No custom Docker image, no shared_preload_libraries, no background worker.

Files delivered: - tests/e2e/light.rs — LightE2eDb harness (exported as E2eDb via feature gate) - tests/e2e/mod.rs — Conditional compilation: #[cfg(feature = "light-e2e")] - Cargo.toml — light-e2e = [] feature - justfile — package-extension, test-light-e2e, test-light-e2e-fast - .github/workflows/ci.yml — light-e2e-tests job (runs on every PR)

42 test files (~570 tests) are light-eligible. 10 files (~90 tests) require full E2E (bgworker, scheduler, bench tuning, upgrade, GUC variation).

What Remains

All phases are complete. No remaining work items.

P4 Scope Revision

The original P4 plan assumed LIMIT/OFFSET, FOR UPDATE, TABLESAMPLE, and self-reference validation could be factored into pure validate_*() functions. Investigation revealed these checks all call raw_parser() (PostgreSQL’s C parser via FFI) and cannot be unit-tested without a DB backend. They belong in the Light-E2E tier (P3), not P4.

Instead, P4 was redirected to test existing pure functions that lacked coverage — parser helpers, OpTree methods, and expression utilities:

Sub-phase	Function	Tests
P4-A	`strip_order_by_and_limit()`	7
P4-B	`Expr::output_name()`	5
P4-C	`unwrap_transparent()`	5
P4-D	`OpTree::output_columns()` (8 variants)	8
P4-E	`OpTree::source_oids()` (6 variants)	6
P4-F	`split_and_predicates()` / `join_and_predicates()`	7
P4-G	`AggFunc::is_group_rescan()`	10
P4-H	`collect_volatilities()` expanded (COALESCE et al.)	7

Motivation
Current State
Phase 1 — Unit-Test Already-Pure Functions
Phase 2 — Extract & Unit-Test Embedded Logic
Phase 3 — Light-E2E Tier
Phase 4 — Promote Validation Tests to Unit Level
Implementation Order
Verification Criteria

Motivation

The test pyramid is bottom-heavy by raw count but structurally top-heavy in two ways:

660 E2E tests require a custom Docker image that takes ~20 minutes to build, so they’re skipped on every PR. Only 51 of those tests actually need shared_preload_libraries. The other 580+ tests run against the extension API but don’t need the background worker — they could use a stock PG image with the compiled .so mounted in.
Several large modules with complex logic have minimal unit tests. Their correctness is validated exclusively by E2E tests, meaning feedback is slow and failures are hard to localize.

Module	Lines	Unit Tests	Lines/Test
`src/ivm.rs`	922	0	∞
`src/scheduler.rs`	1,582	4	395
`src/monitor.rs`	1,836	5	367
`src/cdc.rs`	1,594	8	199
`src/hooks.rs`	1,583	8	198

By contrast the DVM engine is well-covered: parser.rs (279 tests), aggregate.rs (110), api.rs (85), dag.rs (48).

Current State

Tier	Tests	Share	Build Overhead	Runs on PR?
Unit	1,040	57%	None	Yes
Property	27	2%	None	Yes
Integration (testcontainers, stock PG)	81	4%	~seconds	Yes
E2E (custom Docker image)	660	37%	~20 min	No

Phase 1 — Unit-Test Already-Pure Functions

These functions are already extracted and side-effect-free. They just need test coverage. Zero refactoring required.

ID	Module	Function	Lines	Est. Tests
P1-1	`src/ivm.rs`	`IvmLockMode::for_query()`	L71-82	4–6
P1-2	`src/ivm.rs`	`IvmLockMode::is_simple_scan_chain()`	L86-95	4–6
P1-3	`src/ivm.rs`	`hash_str()`	L130-134	2
P1-4	`src/cdc.rs`	`build_changed_cols_bitmask_expr()`	L311-339	5–8
P1-5	`src/cdc.rs`	`parse_partition_upper_bound()`	L605-613	4–6

Effort: ~30 min
Yield: ~20 new unit tests

P1-1 / P1-2 — IvmLockMode tests

IvmLockMode::for_query() delegates to dvm::parse_defining_query() (pure parser) and is_simple_scan_chain() (pattern match on OpTree). Both are fully testable without SPI.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_simple_select_gets_row_exclusive() {
        assert_eq!(
            IvmLockMode::for_query("SELECT id, val FROM t"),
            IvmLockMode::RowExclusive,
        );
    }

    #[test]
    fn test_aggregate_gets_exclusive() {
        assert_eq!(
            IvmLockMode::for_query("SELECT dept, COUNT(*) FROM t GROUP BY dept"),
            IvmLockMode::Exclusive,
        );
    }

    #[test]
    fn test_join_gets_exclusive() {
        assert_eq!(
            IvmLockMode::for_query("SELECT a.id FROM a JOIN b ON a.id = b.id"),
            IvmLockMode::Exclusive,
        );
    }

    #[test]
    fn test_unparseable_query_defaults_to_exclusive() {
        assert_eq!(
            IvmLockMode::for_query("NOT VALID SQL {{{{"),
            IvmLockMode::Exclusive,
        );
    }
}

P1-4 — build_changed_cols_bitmask_expr

Already a pure fn(&[String], &[(String, String)]) -> Option<String>. Test the bitmask algebra:

#[test]
fn test_bitmask_single_non_pk_col() {
    let pk = vec!["id".to_string()];
    let cols = vec![
        ("id".to_string(), "integer".to_string()),
        ("val".to_string(), "text".to_string()),
    ];
    let expr = build_changed_cols_bitmask_expr(&pk, &cols);
    assert!(expr.is_some());
    // bit 1 set when "val" differs between NEW and OLD
}

#[test]
fn test_bitmask_none_when_over_63_cols() {
    let pk = vec!["id".to_string()];
    let cols: Vec<_> = (0..64)
        .map(|i| (format!("c{i}"), "int".to_string()))
        .collect();
    assert!(build_changed_cols_bitmask_expr(&pk, &cols).is_none());
}

#[test]
fn test_bitmask_none_when_no_pk() {
    let cols = vec![("a".to_string(), "int".to_string())];
    assert!(build_changed_cols_bitmask_expr(&[], &cols).is_none());
}

P1-5 — parse_partition_upper_bound

#[test]
fn test_parse_valid_range() {
    assert_eq!(
        parse_partition_upper_bound("FOR VALUES FROM ('0/0') TO ('1/A3F')"),
        Some("1/A3F".to_string()),
    );
}

#[test]
fn test_parse_no_match() {
    assert_eq!(parse_partition_upper_bound("LIST (1, 2, 3)"), None);
}

Phase 2 — Extract & Unit-Test Embedded Logic

These are blocks of pure logic currently inlined inside SPI-calling functions. Each requires a small refactor: move the logic into a standalone fn, call it from the original function, and add unit tests.

P2-A — IVM SQL Builders (`src/ivm.rs`)

ID	Function to Extract	From	Input → Output
P2-A1	`build_ivm_delete_sql(st, delta, keyless) → String`	`pgt_ivm_apply_delta` L555-600	2 table names + bool → SQL
P2-A2	`build_ivm_insert_sql(st, delta, cols, keyless) → String`	`pgt_ivm_apply_delta` L604-643	2 table names + col list + bool → SQL
P2-A3	`build_column_lists(cols) → (col_list, d_col_list, update_set)`	`pgt_ivm_apply_delta` + `apply_topk_micro_refresh`	`&[String]` → 3 SQL fragments
P2-A4	`ivm_trigger_names(pgt_id, oid) → IvmTriggerNames`	`setup_ivm_triggers` (repeated 8×)	2 ints → struct of 8 name strings

Effort: ~60 min
Yield: ~15-20 new unit tests

Example for P2-A1:

/// Build the DELETE SQL for IVM delta application.
///
/// Keyless sources use counted-DELETE (ROW_NUMBER matching) to avoid
/// deleting ALL duplicates when only a subset should be removed.
fn build_ivm_delete_sql(
    st_qualified: &str,
    delta_table: &str,
    has_keyless_source: bool,
) -> String { /* extracted from pgt_ivm_apply_delta */ }

#[cfg(test)]
mod tests {
    #[test]
    fn test_keyed_delete_uses_simple_join() {
        let sql = build_ivm_delete_sql(r#""public"."my_st""#, "__delta_1", false);
        assert!(sql.contains("USING"));
        assert!(sql.contains("__pgt_action = 'D'"));
        assert!(!sql.contains("ROW_NUMBER"));
    }

    #[test]
    fn test_keyless_delete_uses_row_number() {
        let sql = build_ivm_delete_sql(r#""public"."my_st""#, "__delta_1", true);
        assert!(sql.contains("ROW_NUMBER"));
        assert!(sql.contains("st_rn <= dc.del_count"));
    }
}

P2-B — Monitor Tree Renderer (`src/monitor.rs`)

The dependency_tree() SQL-returning function contains a ~120-line inner dfs() function that takes HashMap<String, Vec<String>> data structures and produces ASCII tree rows using box-drawing characters. It has zero SPI dependency.

ID	Function to Extract	Lines	Input → Output
P2-B1	`render_dependency_tree(st_info, children, sources) → Vec<TreeRow>`	L1260-1380	3 HashMaps → vec of formatted rows

Effort: ~45 min
Yield: ~8-10 tests (leaf, single-chain, diamond, forest topologies)

P2-C — DDL Event Classification (`src/hooks.rs`)

ID	Function to Extract	Lines	Input → Output
P2-C1	`classify_ddl_event(object_type, command_tag) → DdlEventKind`	L131-178	2 strings → enum
P2-C2	`detect_schema_change_pure(stored_entries, current_cols) → SchemaChangeKind`	L1430-1527	JSON + HashMap → enum

Effort: ~45 min
Yield: ~10-12 tests

P2-D — Scheduler Decision Logic (`src/scheduler.rs`)

ID	Function to Extract	Lines	Input → Output
P2-D1	`should_retry_db(had_scheduler, elapsed, skip_ttl, retry_ttl) → bool`	L191-201	bools + durations → bool
P2-D2	`is_group_due_pure(member_due, policy) → bool`	L812-833	`&[bool]` + enum → bool
P2-D3	`is_falling_behind(elapsed_ms, schedule_ms) → Option<f64>`	L1428-1448	2 ints → optional ratio

Effort: ~30 min
Yield: ~8-10 tests

P2-E — Alert Payload Construction (`src/monitor.rs`)

ID	Function to Extract	Lines	Input → Output
P2-E1	`build_alert_payload(event, schema, name, extra) → String`	L71-93	4 strings → JSON
P2-E2	`split_qualified_name(name) → (&str, &str)`	~5 call sites	string → 2 strings

Effort: ~20 min
Yield: ~6-8 tests (escaping, truncation at 7900 chars, default schema)

P2-F — CDC Column-List Generation (`src/cdc.rs`)

ID	Function to Extract	Lines	Input → Output
P2-F1	`build_trigger_column_lists(columns) → TriggerColumnLists`	L100-117	column defs → 4 SQL fragments
P2-F2	`build_typed_col_defs(columns) → String`	L357-367	column defs → DDL fragment

Effort: ~20 min
Yield: ~6 tests

Phase 3 — Light-E2E Tier

Problem

88% of E2E tests (580 of 660) don’t need shared_preload_libraries. They call the extension API but never exercise the background worker/scheduler. Yet they all require a custom Docker image that takes ~20 min to build, meaning they are skipped on every PR.

Solution — Bind-Mount + Exec (Implemented)

Instead of building a custom Docker image, the light-E2E harness:

Runs cargo pgrx package to produce compiled extension artifacts.
Starts a stock postgres:18.1 container with the artifacts bind-mounted to /tmp/pg_ext.
Uses container.exec() to copy files to the PostgreSQL extension dirs.
Runs CREATE EXTENSION pg_trickle which loads the .so on-demand.

┌──────────────────────────────────────────────────────┐
│                   Before                             │
│                                                      │
│   Unit (1,164)  ──────► runs on PR ✓                 │
│   Integration (81)  ──► runs on PR ✓                 │
│   E2E (660)  ────────► skipped on PR ✗               │
│                        (20 min Docker build)         │
└──────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────┐
│                   After                              │
│                                                      │
│   Unit (1,164)  ──────► runs on PR ✓                 │
│   Integration (81)  ──► runs on PR ✓                 │
│   Light E2E (~570) ──► runs on PR ✓  ← NEW          │
│                        (stock PG + bind-mount)       │
│   Full E2E (~90)  ───► push-to-main + daily only     │
│                        (bgworker, bench, upgrade)    │
└──────────────────────────────────────────────────────┘

Architecture

// tests/e2e/light.rs (exported as E2eDb via #[cfg(feature = "light-e2e")])

pub struct E2eDb {
    pub pool: PgPool,
    _container: ContainerAsync<GenericImage>,
}

impl E2eDb {
    pub async fn new() -> Self {
        let ext_dir = find_extension_dir(); // PGT_EXTENSION_DIR or default

        let container = GenericImage::new("postgres", "18.1")
            .with_mount(Mount::bind_mount(ext_dir, "/tmp/pg_ext"))
            .start().await;

        // Copy extension files from staging to system dirs
        container.exec(ExecCommand::new(vec!["sh", "-c",
            "cp /tmp/pg_ext/usr/share/postgresql/18/extension/pg_trickle* \
                /usr/share/postgresql/18/extension/ && \
             cp /tmp/pg_ext/usr/lib/postgresql/18/lib/pg_trickle* \
                /usr/lib/postgresql/18/lib/"
        ])).await;

        // Connect + CREATE EXTENSION
        // ...
    }
}

Files That Must Stay in Full E2E

These 10 files (~90 tests) require shared_preload_libraries, the background worker, SET pg_trickle.* GUCs, or custom image layering:

File	Tests	Reason
`e2e_bench_tests.rs`	18	Uses `new_bench()` with shmem tuning
`e2e_bgworker_tests.rs`	9	Tests scheduler/bgworker lifecycle
`e2e_cascade_regression_tests.rs`	8	Uses `SET pg_trickle.*` GUCs
`e2e_dag_autorefresh_tests.rs`	5	`wait_for_auto_refresh` requires scheduler
`e2e_ddl_event_tests.rs`	14	Uses `SET pg_trickle.*` GUCs
`e2e_guc_variation_tests.rs`	7	Entirely about GUC variations
`e2e_multi_cycle_tests.rs`	5	Uses `wait_for_auto_refresh`
`e2e_tpch_tests.rs`	6	Heavy benchmarks, custom setup
`e2e_upgrade_tests.rs`	13	Tests version upgrade path, image layering
`e2e_user_trigger_tests.rs`	10	Uses `SET pg_trickle.*` for trigger control

Limitations

No background worker / scheduler — shared_preload_libraries is not set.
No auto-refresh — wait_for_auto_refresh() always returns false.
GUCs may be unavailable — SET pg_trickle.* only works after the .so is loaded, and only in the same session.
macOS only works in CI — cargo pgrx package on macOS produces .dylib files that can’t run inside a Linux container. Local macOS development uses just test-e2e-fast instead.

CI Impact

Job	Before	After
PR	Unit + Integration (1,164 + 81 tests)	Unit + Integration + Light E2E (~1,815 tests)
Push to main	Unit + Integration + E2E	Unit + Integration + Light E2E + Full E2E
Daily	All	All (unchanged)

Phase 4 — Promote Validation Tests to Unit Level

~70 E2E tests in e2e_error_tests.rs and e2e_coverage_error_tests.rs test input validation: rejecting LIMIT, FOR UPDATE, self-references, TABLESAMPLE, volatile functions, etc. The validation logic lives in the Rust parser/API layer.

Approach

For each validation check currently tested only by E2E:

Factor the check into a pure fn validate_*(input) -> Result<(), PgTrickleError>.
Add unit tests directly in the source module.
Keep the E2E test as a thin integration smoke test (or remove if redundant).

Candidates

Validation	Current Location	E2E Tests	Unit-Testable?
LIMIT/OFFSET rejection	`api.rs` → `validate_defining_query`	3	Yes — string inspection
FOR UPDATE rejection	`api.rs`	1	Yes
Self-reference detection	`api.rs`	2	Yes — name matching
TABLESAMPLE rejection	`dvm/parser.rs`	1	Yes — OpTree check
Volatile function detection	`dvm/parser.rs`	2	Yes — already parsed
Recursive CTE depth guard	`dvm/operators/recursive_cte.rs`	2	Yes — tree depth
Cycle detection in DAG	`dag.rs`	3	Already unit-tested

Effort: ~2-3 hours (refactoring + tests)
Yield: ~25-30 new unit tests; the E2E tests become thin smoke checks

Implementation Order

Priority	Phase	Effort	New Unit Tests	Impact
1	P1 — Test already-pure functions	30 min	~20	Quick wins, zero refactoring
2	P2-A — IVM SQL builders	60 min	~18	Highest-risk untested module
3	P2-B — Monitor tree renderer	45 min	~10	120 lines of pure logic, 0 tests
4	P2-C — DDL event classification	45 min	~12	Correctness-critical for auto-reinit
5	P2-D — Scheduler decisions	30 min	~10	Core scheduling correctness
6	P2-E — Alert payloads	20 min	~8	Subtle escaping/truncation bugs
7	P2-F — CDC column lists	20 min	~6	DRY + coverage
8	P3 — Light-E2E harness	6-8 hrs	—	580 tests gain PR feedback
9	P4 — Promote validation tests	2-3 hrs	~28	Move error-path tests to unit tier

Projected State After All Phases

Tier	Tests	Share	Runs on PR?
Unit	~1,164	63%	Yes
Property	27	1%	Yes
Integration	81	4%	Yes
Light E2E	~580	31%	Yes
Full E2E	~51	3%	No (main + daily)

All tests except 51 (3%) will run on every PR.

Verification Criteria

[x] just test-unit passes with the new unit tests (Phases 1-2, 4)
[x] just lint passes with zero warnings after all extractions
[ ] No E2E test is deleted — only duplicated to unit level or migrated to light-E2E tier
[ ] Light-E2E harness works with cargo pgrx package artifacts
[ ] CI job for light-E2E completes in < 10 min on PR
[ ] Full E2E continues to pass on push-to-main
[ ] Lines/test ratio for ivm.rs, scheduler.rs, monitor.rs, cdc.rs, hooks.rs drops below 100

PGXN

PostgreSQL Extension Network

Contents

PLAN: Test Pyramid Rebalance

Progress Summary

P3 Implementation

What Remains

P4 Scope Revision

Table of Contents

Motivation

Current State

Phase 1 — Unit-Test Already-Pure Functions

P1-1 / P1-2 — IvmLockMode tests

P1-4 — build_changed_cols_bitmask_expr

P1-5 — parse_partition_upper_bound

Phase 2 — Extract & Unit-Test Embedded Logic

P2-A — IVM SQL Builders (`src/ivm.rs`)

P2-B — Monitor Tree Renderer (`src/monitor.rs`)

P2-C — DDL Event Classification (`src/hooks.rs`)

P2-D — Scheduler Decision Logic (`src/scheduler.rs`)

P2-E — Alert Payload Construction (`src/monitor.rs`)

P2-F — CDC Column-List Generation (`src/cdc.rs`)

Phase 3 — Light-E2E Tier

Problem

Solution — Bind-Mount + Exec (Implemented)

Architecture

Files That Must Stay in Full E2E

Limitations

CI Impact

Phase 4 — Promote Validation Tests to Unit Level

Approach

Candidates

Implementation Order

Projected State After All Phases

Verification Criteria

PGXN

PostgreSQL Extension Network

Contents

PLAN: Test Pyramid Rebalance

Progress Summary

P3 Implementation

What Remains

P4 Scope Revision

Table of Contents

Motivation

Current State

Phase 1 — Unit-Test Already-Pure Functions

P1-1 / P1-2 — IvmLockMode tests

P1-4 — build_changed_cols_bitmask_expr

P1-5 — parse_partition_upper_bound

Phase 2 — Extract & Unit-Test Embedded Logic

P2-A — IVM SQL Builders (src/ivm.rs)

P2-B — Monitor Tree Renderer (src/monitor.rs)

P2-C — DDL Event Classification (src/hooks.rs)

P2-D — Scheduler Decision Logic (src/scheduler.rs)

P2-E — Alert Payload Construction (src/monitor.rs)

P2-F — CDC Column-List Generation (src/cdc.rs)

Phase 3 — Light-E2E Tier

Problem

Solution — Bind-Mount + Exec (Implemented)

Architecture

Files That Must Stay in Full E2E

Limitations

CI Impact

Phase 4 — Promote Validation Tests to Unit Level

Approach

Candidates

Implementation Order

Projected State After All Phases

Verification Criteria

P2-A — IVM SQL Builders (`src/ivm.rs`)

P2-B — Monitor Tree Renderer (`src/monitor.rs`)

P2-C — DDL Event Classification (`src/hooks.rs`)

P2-D — Scheduler Decision Logic (`src/scheduler.rs`)

P2-E — Alert Payload Construction (`src/monitor.rs`)

P2-F — CDC Column-List Generation (`src/cdc.rs`)