Contents
- Light E2E Test Suite — Deep Evaluation Report
- Table of Contents
- Executive Summary
- Harness Architecture & Limitations
- Cross-Cutting Findings
- Per-File Analysis
- e2e_smoke_tests.rs
- e2e_create_tests.rs
- e2e_drop_tests.rs
- e2e_alter_tests.rs
- e2e_create_or_replace_tests.rs
- e2e_error_tests.rs
- e2e_lifecycle_tests.rs
- e2e_refresh_tests.rs
- e2e_cdc_tests.rs
- e2e_stmt_cdc_tests.rs
- e2e_expression_tests.rs
- e2e_property_tests.rs
- e2e_cte_tests.rs
- e2e_ivm_tests.rs
- e2e_concurrent_tests.rs
- e2e_coverage_error_tests.rs
- e2e_coverage_parser_tests.rs
- e2e_diamond_tests.rs
- e2e_dag_operations_tests.rs
- e2e_dag_topology_tests.rs
- e2e_dag_error_tests.rs
- e2e_dag_concurrent_tests.rs
- e2e_dag_immediate_tests.rs
- e2e_keyless_duplicate_tests.rs
- e2e_lateral_tests.rs
- e2e_lateral_subquery_tests.rs
- e2e_full_join_tests.rs
- e2e_window_tests.rs
- e2e_multi_window_tests.rs
- e2e_set_operation_tests.rs
- e2e_topk_tests.rs
- e2e_scalar_subquery_tests.rs
- e2e_sublink_or_tests.rs
- e2e_all_subquery_tests.rs
- e2e_aggregate_coverage_tests.rs
- e2e_having_transition_tests.rs
- e2e_view_tests.rs
- e2e_monitoring_tests.rs
- e2e_getting_started_tests.rs
- e2e_guard_trigger_tests.rs
- e2e_phase4_ergonomics_tests.rs
- e2e_pipeline_dag_tests.rs
- e2e_mixed_mode_dag_tests.rs
- e2e_multi_cycle_dag_tests.rs
- e2e_rows_from_tests.rs
- e2e_snapshot_consistency_tests.rs
- e2e_watermark_gating_tests.rs
- Priority Mitigations
- Appendix: Summary Table
Light E2E Test Suite — Deep Evaluation Report
Date: 2025-03-16 Scope: All 47 files in
scripts/run_light_e2e_tests.shLIGHT_E2E_TESTS Goal: Assess coverage confidence and identify mitigations to harden the suite
Table of Contents
- Executive Summary
- Harness Architecture & Limitations
- Cross-Cutting Findings
- Per-File Analysis
- Priority Mitigations
- Appendix: Summary Table
Executive Summary
The Light E2E suite comprises 47 test files with approximately ~670 test
functions running against a stock postgres:18.3 container with bind-mounted
extension artifacts. No background worker, scheduler, or
shared_preload_libraries is available.
Confidence level: MODERATE-HIGH (≈75%).
The suite excels at differential correctness validation — the majority of
DVM-critical tests use assert_st_matches_query() to compare stream table
contents against re-executing the defining query. Multi-layer DAG pipelines,
aggregate coverage, HAVING transitions, and multi-cycle stress tests are
gold-standard.
However, significant gaps undermine confidence:
| Severity | Finding | Impact |
|---|---|---|
| CRITICAL | light.rs uses EXCEPT (set) not EXCEPT ALL (multiset) in assert_st_matches_query |
Duplicate-row bugs invisible in light E2E |
| CRITICAL | rows_from_tests verifies only row counts, never contents |
Rewrite bugs pass silently |
| HIGH | expression_tests (41 tests) uses row counts only, no assert_st_matches_query |
Expression evaluation errors invisible |
| HIGH | ivm_tests (26 tests) uses row counts only, no assert_st_matches_query |
Core IVM correctness undertested |
| HIGH | getting_started_tests — no assert_st_matches_query on full results |
Spurious/missing rows missed |
| MEDIUM | monitoring_tests — existence checks only, no value validation |
Staleness/history never verified |
| MEDIUM | create_or_replace_tests — zero error-path tests |
Failure modes untested |
| MEDIUM | concurrent_tests — smoke tests (“doesn’t crash”), no correctness |
Race conditions pass silently |
Harness Architecture & Limitations
Harness file: tests/e2e/light.rs
Feature gate: --features light-e2e (selected in tests/e2e/mod.rs)
Runner: scripts/run_light_e2e_tests.sh
What Light E2E provides
- Stock
postgres:18.3Docker container via testcontainers - Extension loaded via
CREATE EXTENSION IF NOT EXISTS pg_trickle CASCADE - Bind-mounted
.soand.controlfromcargo pgrx packageoutput - Full SQL API surface:
create_stream_table(),refresh_stream_table(),drop_stream_table(),alter_stream_table(), etc. - CDC triggers installed and functional
- Manual refresh works
What Light E2E cannot test
- Background worker / scheduler — no
shared_preload_libraries - Auto-refresh —
wait_for_auto_refresh()always returnsfalse - Scheduler gating —
wait_for_scheduler()always returnsfalse - GUC availability not guaranteed —
ALTER SYSTEM SET pg_trickle.*may fail - WAL-level features — WAL decoder not loaded
Critical harness defect: EXCEPT vs EXCEPT ALL
The assert_st_matches_query() implementation in tests/e2e/light.rs
(lines 563–577) uses plain EXCEPT for set comparison:
(SELECT cols FROM st_table EXCEPT (defining_query))
UNION ALL
((defining_query) EXCEPT SELECT cols FROM st_table)
The full E2E harness in tests/e2e/mod.rs was fixed in PR #208 to use
EXCEPT ALL for proper multiset (bag) equality. Light.rs was missed.
Impact: Any test where the stream table produces the right set of
distinct rows but incorrect multiplicities will pass in light E2E but
fail in full E2E. This directly affects:
- e2e_keyless_duplicate_tests — tests duplicate-row correctness
- e2e_set_operation_tests — tests INTERSECT ALL / EXCEPT ALL
- Any aggregate test where GROUP BY accidentally deduplicates
Mitigation: Apply the same EXCEPT ALL fix to tests/e2e/light.rs.
This is the single highest-priority fix.
Tests gated out of light E2E
Only e2e_watermark_gating_tests.rs has #[cfg(not(feature = "light-e2e"))]
annotations, gating 3 scheduler-dependent tests:
- test_scheduler_skips_misaligned_watermark
- test_scheduler_resumes_after_watermark_alignment
- test_scheduler_respects_tolerance
All other tests in the 47-file allowlist execute unconditionally.
Cross-Cutting Findings
1. assert_st_matches_query adoption is inconsistent
| Category | Files using it extensively | Files NOT using it |
|---|---|---|
| DVM core (aggregates, HAVING, set ops, subqueries) | ✅ 12 files | — |
| DAG pipelines (pipeline, multi-cycle, mixed-mode) | ✅ 4 files | — |
| Expression evaluation | — | ❌ expression_tests (41 tests) |
| Core IVM loop | — | ❌ ivm_tests (26 tests) |
| CRUD operations | — | ❌ create_tests, alter_tests |
| Tutorial walkthrough | — | ❌ getting_started_tests |
| TopK queries | — | ❌ topk_tests (60+ tests, uses counts/min/max) |
| Monitoring/meta | — | ❌ monitoring_tests, phase4_ergonomics_tests |
| Lateral subqueries | — | ❌ lateral_subquery_tests |
| ROWS FROM rewriting | — | ❌ rows_from_tests |
2. Error-path testing is sparse
Most files test only the happy path. Files with meaningful error-path coverage:
- e2e_error_tests — dedicated error file (but messages rarely verified)
- e2e_guard_trigger_tests — 4 error tests with message checks ✅
- e2e_watermark_gating_tests — rejection tests with message checks ✅
- e2e_topk_tests — 3 rejection tests ✅
- e2e_phase4_ergonomics_tests — GUC removal, NULL schedule ✅
Files with ZERO error-path tests:
create_or_replace_tests, cdc_tests, cte_tests, expression_tests,
ivm_tests, lateral_tests, lateral_subquery_tests, full_join_tests,
window_tests, multi_window_tests, set_operation_tests,
scalar_subquery_tests, sublink_or_tests, all_subquery_tests,
keyless_duplicate_tests, rows_from_tests, pipeline_dag_tests,
mixed_mode_dag_tests, multi_cycle_dag_tests, snapshot_consistency_tests,
aggregate_coverage_tests, having_transition_tests,
differential_gaps_tests
3. Row-count-only assertions hide bugs
Several test files rely exclusively on row counts (e.g., assert_eq!(count, 5))
without checking actual content. If a query returns the right number of rows but
with wrong values, these tests pass. Affected files:
- rows_from_tests — 6 tests, all count-only (CRITICAL)
- expression_tests — 41 tests, almost all count-only (HIGH)
- ivm_tests — 26 tests, count + existence only (HIGH)
- lateral_tests (FULL mode) — 5 tests, count-only (MEDIUM)
4. Smoke-test-level files
These files verify “it doesn’t crash” rather than “it produces correct results”:
- concurrent_tests — tests concurrent refresh doesn’t deadlock/crash
- smoke_tests — basic infrastructure verification
- coverage_error_tests (partially) — some tests verify error occurs but
not content
Per-File Analysis
e2e_smoke_tests.rs
| Metric | Value |
|---|---|
| Tests | ~5 |
| Primary assertion | Infrastructure checks |
assert_st_matches_query |
0 |
| Error paths | Some |
| Risk | LOW |
What it tests: Extension loads, CREATE EXTENSION succeeds, basic smoke
signals that the harness is functional.
Assessment: Appropriate for its role. No mitigations needed — this is correctly a minimal infrastructure check.
e2e_create_tests.rs
| Metric | Value |
|---|---|
| Tests | ~30 |
| Primary assertion | Catalog entry verification, status checks |
assert_st_matches_query |
ZERO |
| Error paths | Some (invalid inputs) |
| Risk | MEDIUM |
What it tests: Stream table creation with various SQL constructs. Verifies catalog entries, refresh modes, schedule storage, status values.
Assessment: Tests verify that creation succeeds and catalog metadata is correct, but never verify that the stream table contents match the defining query after initial load.
Mitigations:
1. Add assert_st_matches_query() after creation for at least the 5 most
complex queries (JOINs, aggregates, CTEs).
2. Add a test that creates with an invalid query (e.g., referencing
non-existent table) and verifies the error message.
e2e_drop_tests.rs
| Metric | Value |
|---|---|
| Tests | ~20 |
| Primary assertion | Catalog cleanup, trigger removal, storage removal |
assert_st_matches_query |
N/A (tests removal, not content) |
| Error paths | Some (drop non-existent) |
| Risk | LOW |
What it tests: Drop semantics: catalog entry removal, trigger cleanup, storage table removal, IF EXISTS, CASCADE (DAG dependencies).
Assessment: Solid. Verifies the full cleanup lifecycle. No content verification needed since the purpose is deletion.
e2e_alter_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Catalog field changes |
assert_st_matches_query |
ZERO |
| Error paths | Some (invalid mode transitions) |
| Risk | MEDIUM |
What it tests: ALTER STREAM TABLE for schedule, refresh mode, query
changes. Verifies catalog metadata updates.
Assessment: Tests check that catalog columns are updated but never verify that altered stream tables still produce correct results after refresh. For example, after changing refresh mode from FULL to DIFFERENTIAL, no test checks that DIFFERENTIAL refresh works correctly.
Mitigations:
1. After each ALTER that changes refresh mode or query, add a DML → refresh →
assert_st_matches_query() cycle.
2. Add test for ALTER on a stream table with existing data — verify data is
preserved (or re-initialized) correctly.
e2e_create_or_replace_tests.rs
| Metric | Value |
|---|---|
| Tests | ~10 |
| Primary assertion | No-op when exists, create when missing |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | MEDIUM-HIGH |
What it tests: CREATE OR REPLACE STREAM TABLE semantics — creates when
absent, silently succeeds when present.
Assessment: Tests verify the API contract (no-op vs. create) but: - Never verify that the original definition is preserved when “replace” is a no-op (could silently overwrite). - No error-path tests at all. - No test that replaces with a different query and verifies behavior.
Mitigations:
1. After the no-op case, query the catalog to verify the original
defining_query is still stored.
2. Add at least one error-path test (e.g., replace with invalid SQL).
3. Add a test that verifies behavior when the defining query actually differs.
e2e_error_tests.rs
| Metric | Value |
|---|---|
| Tests | ~25 |
| Primary assertion | is_err() checks |
assert_st_matches_query |
ZERO (error-only file) |
| Error paths | ALL |
| Risk | MEDIUM |
What it tests: Error paths for unsupported SQL constructs, invalid parameters, constraint violations.
Assessment: Most tests check result.is_err() but don’t verify error
message content. Some tests have been fixed in PR #208 to check messages,
but many remain message-free. There are also some tests that check is_err()
on queries that might be supported now (stale rejection checks).
Mitigations:
1. Add .contains("expected substring") checks to remaining is_err() tests.
2. Audit which “rejected” queries are now actually supported and remove/update
stale tests.
e2e_lifecycle_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Status transitions, catalog state |
assert_st_matches_query |
1 test |
| Error paths | Some (invalid transitions) |
| Risk | MEDIUM |
What it tests: Full lifecycle: CREATE → REFRESH → ALTER → DROP. Status transitions (EMPTY → ACTIVE → ERROR → etc.).
Assessment: Good coverage of the lifecycle state machine but relies mostly on status field checks rather than content validation.
Mitigations:
1. Add assert_st_matches_query() after each REFRESH step to verify content
correctness, not just status.
e2e_refresh_tests.rs
| Metric | Value |
|---|---|
| Tests | ~30 |
| Primary assertion | assert_st_matches_query extensively |
assert_st_matches_query |
16+ uses |
| Error paths | Some |
| Risk | LOW |
What it tests: Full and differential refresh correctness after INSERT, UPDATE, DELETE on source tables. Multi-step DML cycles, no-change refreshes, multiple source tables.
Assessment: Excellent. One of the strongest files in the suite. Heavy
use of assert_st_matches_query() with multi-step DML operations.
e2e_cdc_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Change buffer contents, trigger existence |
assert_st_matches_query |
Some |
| Error paths | ZERO |
| Risk | MEDIUM |
What it tests: CDC trigger installation, change buffer population, INSERT/UPDATE/DELETE capture, TRUNCATE handling.
Assessment: Good validation that CDC triggers capture changes correctly. Missing error-path tests (e.g., what happens when source table is dropped while triggers exist).
Mitigations:
1. Add a test for source-table DDL while CDC triggers are active.
2. Add assert_st_matches_query() after each CDC-captured operation.
e2e_stmt_cdc_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Bit-level trigger verification |
assert_st_matches_query |
Some |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Statement-level CDC trigger behavior. Verifies that triggers fire on correct DML operations, with granular bit-level trigger type checks.
Assessment: Very strong — tests validate trigger configuration at a lower level than most other files.
e2e_expression_tests.rs
| Metric | Value |
|---|---|
| Tests | ~41 |
| Primary assertion | Row counts only |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | HIGH |
What it tests: Expression evaluation in stream tables: arithmetic, string functions, CASE/WHEN, COALESCE, NULLIF, type casts, date functions, array operations, JSON operations, GREATEST/LEAST.
Assessment: Critically weak. 41 tests all verify row counts but never
check that expression values are correct. For example, a test for
UPPER(name) might check that 5 rows exist but not that the values are
actually uppercased. If the DVM passes through source values unchanged, all
tests pass.
Mitigations (HIGH PRIORITY):
1. Add assert_st_matches_query() to every expression test. This single
change would transform 41 smoke tests into 41 correctness tests.
2. At minimum, add specific value assertions to the 10 most complex expression
tests (CASE, COALESCE, JSON operations, array operations).
e2e_property_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Custom assert_invariant with EXCEPT ALL |
assert_st_matches_query |
Custom equivalent |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Property-based invariants on stream tables. Uses a custom
assert_invariant helper that performs EXCEPT ALL comparison.
Assessment: Excellent. Implements its own multiset comparison (EXCEPT ALL). This is the gold standard — these tests would catch duplicate-row bugs even in light E2E (bypassing the light.rs EXCEPT issue).
e2e_cte_tests.rs
| Metric | Value |
|---|---|
| Tests | ~71 |
| Primary assertion | assert_st_matches_query extensively |
assert_st_matches_query |
Extensive |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Common Table Expressions: simple CTEs, recursive CTEs, multi-CTE, CTE with aggregation, CTE with JOINs, CTE referenced multiple times, CTE with window functions.
Assessment: Very thorough. 71 tests with heavy assert_st_matches_query
usage. The largest single test file in the suite.
Mitigations: 1. Add a few error-path tests (e.g., recursive CTE without termination condition, CTE with name collision).
e2e_ivm_tests.rs
| Metric | Value |
|---|---|
| Tests | ~26 |
| Primary assertion | Row counts and existence |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | HIGH |
What it tests: Core incremental view maintenance loop: INSERT → refresh → verify, UPDATE → refresh → verify, DELETE → refresh → verify. Multiple source tables, JOINs, aggregates.
Assessment: Critically undertested for its importance. These are the core IVM tests — they should be the most rigorous in the suite. Instead, they rely on row counts and existence checks. If the IVM engine produces rows with wrong values (e.g., stale aggregates, wrong JOIN results), these tests pass.
Mitigations (HIGH PRIORITY):
1. Add assert_st_matches_query() after every refresh in every test.
2. This is arguably the highest-impact single change in the entire suite — 26
tests instantly become correctness validators.
e2e_concurrent_tests.rs
| Metric | Value |
|---|---|
| Tests | ~8 |
| Primary assertion | “Doesn’t crash” |
assert_st_matches_query |
ZERO |
| Error paths | Some (expected conflicts) |
| Risk | MEDIUM |
What it tests: Concurrent refresh behavior — two refreshes at the same time, concurrent DDL, etc. Verifies no deadlocks or panics.
Assessment: By nature, concurrent tests are difficult to make fully deterministic. The “doesn’t crash” approach is acceptable for detecting deadlocks but misses correctness issues (e.g., lost updates under concurrency).
Mitigations:
1. After each concurrent operation, add assert_st_matches_query() to verify
that the final state is correct regardless of execution order.
e2e_coverage_error_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | is_err() checks |
assert_st_matches_query |
ZERO |
| Error paths | ALL |
| Risk | MEDIUM |
What it tests: Error coverage for edge cases in the parser and DVM engine. Includes cycle detection, unsupported constructs, and boundary conditions.
Assessment: The cycle detection test creates a chain A → B → C → A but doesn’t actually verify that the error mentions “cycle” or includes the participating tables.
Mitigations:
1. Add error message content checks to all is_err() assertions.
2. Verify cycle detection error includes the cycle path.
e2e_coverage_parser_tests.rs
| Metric | Value |
|---|---|
| Tests | ~20 |
| Primary assertion | Parse success/failure, source table detection |
assert_st_matches_query |
ZERO |
| Error paths | Some |
| Risk | LOW |
What it tests: Parser coverage: source table extraction, schema qualification, edge cases in SQL parsing.
Assessment: Appropriate for parser-level tests. These don’t need content validation — they test metadata extraction.
e2e_diamond_tests.rs
| Metric | Value |
|---|---|
| Tests | ~12 |
| Primary assertion | assert_st_matches_query in some tests |
assert_st_matches_query |
Partial |
| Error paths | ZERO |
| Risk | MEDIUM |
What it tests: Diamond dependency patterns where two intermediate stream tables feed a single downstream. Tests none/auto/atomic diamond consistency modes.
Assessment: Good coverage of the core diamond scenario. Some tests use
assert_st_matches_query() but not all.
Mitigations:
1. Add assert_st_matches_query() to all diamond tests after refresh.
2. Add test for diamond with three or more parents.
e2e_dag_operations_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
Extensive |
| Error paths | ZERO |
| Risk | LOW |
What it tests: DAG-aware operations: cascade refresh, topological ordering, multi-level refresh propagation.
Assessment: Strong. Uses assert_st_matches_query() as primary check.
e2e_dag_topology_tests.rs
| Metric | Value |
|---|---|
| Tests | ~10 |
| Primary assertion | Topological ordering checks |
assert_st_matches_query |
Some |
| Error paths | ZERO |
| Risk | LOW |
What it tests: DAG topology correctness: dependency ordering, refresh sequencing, level computation.
Assessment: Good. Tests the graph structure correctly.
e2e_dag_error_tests.rs
| Metric | Value |
|---|---|
| Tests | ~10 |
| Primary assertion | Status checks, retry behavior |
assert_st_matches_query |
ZERO |
| Error paths | Partial |
| Risk | MEDIUM |
What it tests: DAG error handling: refresh failure propagation, error recovery, consecutive error tracking.
Assessment: Tests check error status propagation but don’t verify the
consecutive_errors catalog column value or that recovery actually
re-materializes correct data.
Mitigations:
1. After error recovery, add assert_st_matches_query() to verify correct
re-materialization.
2. Assert consecutive_errors count value, not just status.
e2e_dag_concurrent_tests.rs
| Metric | Value |
|---|---|
| Tests | ~8 |
| Primary assertion | Retry loop success |
assert_st_matches_query |
ZERO |
| Error paths | Some (expected conflicts) |
| Risk | MEDIUM |
What it tests: Concurrent DAG refresh operations — multiple refresh calls on overlapping DAGs.
Assessment: Uses retry loops that mask CDC lag issues. If a refresh fails due to timing, the test retries, which means intermittent failures are hidden.
Mitigations:
1. After retry-loop success, add assert_st_matches_query() to verify correctness.
2. Log retry counts to detect flakiness trends.
e2e_dag_immediate_tests.rs
| Metric | Value |
|---|---|
| Tests | ~15 |
| Primary assertion | Immediate propagation checks |
assert_st_matches_query |
Some |
| Error paths | ZERO |
| Risk | LOW |
What it tests: IMMEDIATE mode in DAG pipelines: DML on source tables propagates immediately through the DAG without explicit refresh.
Assessment: Good. Tests verify immediate propagation works.
e2e_keyless_duplicate_tests.rs
| Metric | Value |
|---|---|
| Tests | 8 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
7/8 |
| Error paths | ZERO |
| Risk | MEDIUM |
What it tests: Keyless (no PRIMARY KEY) tables with duplicate rows under differential refresh. Uses ctid-based row identity.
Assessment: Strong use of assert_st_matches_query(). However, in light
E2E, the EXCEPT-based comparison (not EXCEPT ALL) means duplicate-row
correctness is NOT actually verified. A bug that returns 1 copy instead of 3
identical copies would pass.
Mitigations:
1. Fix light.rs to use EXCEPT ALL (fixes all keyless duplicate tests).
2. Add explicit count assertions for duplicate multiplicities as defense-in-depth.
e2e_lateral_tests.rs
| Metric | Value |
|---|---|
| Tests | 16 |
| Primary assertion | Counts (FULL), assert_st_matches_query (DIFF) |
assert_st_matches_query |
9/16 |
| Error paths | ZERO |
| Risk | MEDIUM |
What it tests: LATERAL set-returning functions (jsonb_array_elements, jsonb_each, unnest) in FULL and DIFFERENTIAL modes.
Assessment: DIFFERENTIAL tests are strong (9 use assert_st_matches_query).
FULL mode tests (5) only check counts — if an SRF returned wrong values but
correct count, tests pass.
Mitigations:
1. Add assert_st_matches_query() to the 5 FULL mode tests.
2. Add test for non-empty → empty array transition.
e2e_lateral_subquery_tests.rs
| Metric | Value |
|---|---|
| Tests | 13 |
| Primary assertion | Specific values + counts |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | HIGH |
What it tests: LATERAL subqueries: correlated subqueries returning multiple rows, LIMIT, LEFT JOIN LATERAL.
Assessment: Relies on manual assertions (counts, specific values, boolean
checks). Without assert_st_matches_query(), if the stream table accidentally
includes extra rows that the manual checks don’t look for, tests pass.
Mitigations:
1. Add assert_st_matches_query() to all 13 tests.
2. Particularly important for LEFT JOIN LATERAL where NULL semantics matter.
e2e_full_join_tests.rs
| Metric | Value |
|---|---|
| Tests | 6 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
6/6 (100%) |
| Error paths | ZERO |
| Risk | LOW |
What it tests: FULL OUTER JOIN differential correctness including NULL join keys and row migration.
Assessment: Excellent. 100% assert_st_matches_query coverage.
Mitigations: 1. Add edge case: delete all rows from one side. 2. Add edge case: multiple unmatched rows on both sides simultaneously.
e2e_window_tests.rs
| Metric | Value |
|---|---|
| Tests | 24 |
| Primary assertion | Values + assert_st_matches_query |
assert_st_matches_query |
4/24 |
| Error paths | Light (acceptance checks) |
| Risk | MEDIUM |
What it tests: Window functions: ROW_NUMBER, RANK, DENSE_RANK, SUM OVER, LAG, LEAD. Nested window expressions (EC-03). Partition key changes (G1.2).
Assessment: Mixed. Complex tests (partition key changes, multiple
partitions) use assert_st_matches_query(). Simpler tests and EC-03 rewrite
acceptance tests only check result.is_ok() without verifying output.
Mitigations:
1. EC-03 rewrite acceptance tests (tests 13–17): if they now test acceptance
instead of rejection, add assert_st_matches_query() to verify the rewrite
produces correct output.
2. Add assert_st_matches_query() to FULL mode tests.
e2e_multi_window_tests.rs
| Metric | Value |
|---|---|
| Tests | 7 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
7/7 (100%) |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Multiple window functions with different PARTITION BY/ORDER BY, frame clauses (ROWS/RANGE), LAG/LEAD, ranking functions.
Assessment: Excellent. 100% assert_st_matches_query coverage.
Mitigations: 1. Add test with LAG/LEAD returning NULL (first/last row in partition). 2. Add test with ties in ranking functions.
e2e_set_operation_tests.rs
| Metric | Value |
|---|---|
| Tests | 7 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
7/7 (100%) |
| Error paths | ZERO |
| Risk | MEDIUM |
What it tests: INTERSECT / EXCEPT differential correctness including ALL variants, multi-way chains, multi-column operations.
Assessment: Good assert_st_matches_query coverage, but the light.rs
EXCEPT bug means INTERSECT ALL / EXCEPT ALL multiplicity is not actually
verified — the assert itself uses EXCEPT (not EXCEPT ALL).
Mitigations:
1. Fix light.rs EXCEPT → EXCEPT ALL (critical).
2. Add test with NULL values in set operations.
3. Add three-way+ chain test.
e2e_topk_tests.rs
| Metric | Value |
|---|---|
| Tests | ~60 |
| Primary assertion | Counts, min/max, catalog checks |
assert_st_matches_query |
ZERO |
| Error paths | YES (3 rejection tests) |
| Risk | MEDIUM |
What it tests: TopK queries (ORDER BY … LIMIT N) in FULL, DIFFERENTIAL, and IMMEDIATE modes. Catalog metadata, OFFSET, FETCH FIRST, rejection cases.
Assessment: Comprehensive test count (60+) but relies on count/min/max
assertions rather than full content comparison. This is partially acceptable
since TopK has specific semantics (correct top-N, not full set), but missing
assert_st_matches_query() means the actual row contents are unverified.
Mitigations:
1. Add assert_st_matches_query() to at least the 10 most critical TopK tests
(basic creation, DIFFERENTIAL refresh, IMMEDIATE mode).
2. Verify IMMEDIATE mode tests actually trigger immediate propagation (not just
explicit refresh fallback).
e2e_scalar_subquery_tests.rs
| Metric | Value |
|---|---|
| Tests | 4 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
4/4 (100%) |
| Error paths | ZERO |
| Risk | LOW-MEDIUM |
What it tests: Scalar subqueries in SELECT, WHERE, and correlated positions.
Assessment: Strong assertion quality. Limited scope — only 4 tests.
Mitigations: 1. Add tests for: scalar in JOIN ON, multiple scalar subqueries in one query, scalar returning multiple rows (should error).
e2e_sublink_or_tests.rs
| Metric | Value |
|---|---|
| Tests | 4 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
4/4 (100%) |
| Error paths | ZERO |
| Risk | LOW-MEDIUM |
What it tests: Sublink expressions (EXISTS, NOT EXISTS, IN) combined with OR.
Assessment: Strong assertion quality. Limited scope — no NOT IN, ANY/ALL, nested sublinks.
Mitigations: 1. Add NOT IN test. 2. Add nested sublink test (EXISTS with inner IN).
e2e_all_subquery_tests.rs
| Metric | Value |
|---|---|
| Tests | 10 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
7/10 |
| Error paths | ZERO |
| Risk | LOW |
What it tests: ALL (subquery) support with various operators, NULL handling, empty subquery edge cases.
Assessment: Strong. Tests all major operators (<, >, >=, =, <>) and NULL semantics.
Mitigations: 1. Add test for NULL in outer value (not just inner subquery). 2. Add FULL mode parity tests.
e2e_aggregate_coverage_tests.rs
| Metric | Value |
|---|---|
| Tests | 18 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
18/18 (100%) |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Every supported aggregate function: SUM, AVG, COUNT, MIN, MAX, DISTINCT variants, STRING_AGG, ARRAY_AGG, BOOL_AND/OR/EVERY, BIT_AND/OR, JSON(B)AGG, JSON(B)OBJECT_AGG, PERCENTILE_CONT/DISC, MODE.
Assessment: Gold standard. 100% assert_st_matches_query coverage with
I/U/D cycles on every aggregate type.
e2e_having_transition_tests.rs
| Metric | Value |
|---|---|
| Tests | 7 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
7/7 (100%) |
| Error paths | ZERO |
| Risk | LOW |
What it tests: HAVING threshold transitions: groups appearing/disappearing, oscillation, row migration between groups.
Assessment: Excellent. Tests the critical path where groups cross HAVING thresholds, which is notoriously hard to get right in IVM.
e2e_view_tests.rs
| Metric | Value |
|---|---|
| Tests | 16 |
| Primary assertion | Mixed (assert_st_matches_query + counts) |
assert_st_matches_query |
~50% |
| Error paths | YES (matview rejection, DROP VIEW) |
| Risk | MEDIUM |
What it tests: View inlining, nested views, materialized view rejection, DDL hooks (CREATE OR REPLACE VIEW, DROP VIEW), TRUNCATE propagation.
Assessment: Good breadth. Main gap: after DDL events (CREATE OR REPLACE VIEW, DROP VIEW), tests verify status but don’t verify refresh correctness post-event.
Mitigations:
1. After CREATE OR REPLACE VIEW, verify a refresh with
assert_st_matches_query() uses the new view definition.
2. Add assert_st_matches_query() to TRUNCATE propagation test.
3. Test view containing UNION.
e2e_monitoring_tests.rs
| Metric | Value |
|---|---|
| Tests | 6 |
| Primary assertion | Existence checks only |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | MEDIUM-HIGH |
What it tests: Monitoring views: pgt_status(), stream_tables_info,
pg_stat_stream_tables, staleness detection, refresh history.
Assessment: Weak. Tests verify that monitoring views exist and are queryable but never validate that the values are correct. Staleness test doesn’t advance time to verify stale=true. Refresh history test doesn’t check column values.
Mitigations:
1. test_stale_detection: After refreshing, wait > schedule interval and verify
stale = true. (May require light-E2E workaround since no scheduler.)
2. test_refresh_history_records: Verify initiated_by, status,
start_time, end_time values, not just column existence.
3. test_pg_stat_stream_tables_view: After manual refresh, verify
total_refreshes > 0.
e2e_getting_started_tests.rs
| Metric | Value |
|---|---|
| Tests | 8 |
| Primary assertion | Specific hardcoded values |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | HIGH |
What it tests: Tutorial walkthrough from GETTING_STARTED.md: recursive CTE, LEFT JOIN + GROUP BY, cascading refreshes across 3 layers, DROP cleanup.
Assessment: Tests verify specific expected values (e.g., “Alice’s salary
total should be 580000”) but never check that the entire result set matches.
Spurious extra rows or missing rows would pass. Also, test_getting_started_step7_drop_in_order
doesn’t verify base tables remain intact after dropping stream tables.
Mitigations (HIGH PRIORITY):
1. Add assert_st_matches_query() at the end of each step to verify full
result set correctness.
2. After DROP, verify source tables still exist with original data.
3. Add a negative test: DROP in wrong order should fail gracefully.
e2e_guard_trigger_tests.rs
| Metric | Value |
|---|---|
| Tests | 5 |
| Primary assertion | Error messages + assert_st_matches_query |
assert_st_matches_query |
1/5 |
| Error paths | 4/5 |
| Risk | LOW |
What it tests: Guard triggers blocking direct DML (INSERT/UPDATE/DELETE/ TRUNCATE) on stream tables.
Assessment: Well-structured. 4 error tests with message validation + 1 success test with correctness check.
e2e_phase4_ergonomics_tests.rs
| Metric | Value |
|---|---|
| Tests | ~20 |
| Primary assertion | Mixed (catalog, warnings, status) |
assert_st_matches_query |
ZERO |
| Error paths | Several |
| Risk | MEDIUM |
What it tests: Ergonomic features: refresh history, quick_health view, create_if_not_exists, calculated schedule, removed GUCs, ALTER warnings.
Assessment: Warning tests use .contains() substring matching which is
fragile to wording changes. create_if_not_exists gap: doesn’t verify original
query is preserved after no-op call.
Mitigations:
1. After create_if_not_exists no-op, query catalog to verify original
defining_query.
2. Consider making warning checks more resilient (check key words rather than
exact phrases).
e2e_pipeline_dag_tests.rs
| Metric | Value |
|---|---|
| Tests | 18 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
Most tests |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Realistic multi-level DAG pipelines: Nexmark auction (3 levels), E-commerce analytics (4 levels), IoT telemetry (3 levels).
Assessment: Exceptional. The most realistic integration tests in the
suite. assert_st_matches_query() at every layer after every mutation.
e2e_mixed_mode_dag_tests.rs
| Metric | Value |
|---|---|
| Tests | 5 |
| Primary assertion | assert_st_matches_query |
assert_st_matches_query |
5/5 (100%) |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Mixed FULL/DIFFERENTIAL/IMMEDIATE modes in DAG pipelines.
Assessment: Strong. Tests mode transitions and mixed cascade behavior.
Mitigations:
1. test_mixed_immediate_leaf does explicit refresh as fallback but doesn’t
confirm IMMEDIATE trigger actually fired. Add validation that no
explicit refresh was needed.
e2e_multi_cycle_dag_tests.rs
| Metric | Value |
|---|---|
| Tests | 6 |
| Primary assertion | assert_st_matches_query (intensive) |
assert_st_matches_query |
60+ calls across tests |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Multi-cycle stress tests: 10 INSERT cycles, 5 mixed DML cycles, no-op drift detection, group elimination/revival, bulk mutation (100 INSERTs + 50 UPDATEs + 30 DELETEs), diamond multi-cycle.
Assessment: Gold standard for stress testing. assert_pipeline_correct()
helper calls assert_st_matches_query() on all layers after every cycle.
e2e_rows_from_tests.rs
| Metric | Value |
|---|---|
| Tests | 6 |
| Primary assertion | Row counts only |
assert_st_matches_query |
ZERO |
| Error paths | ZERO |
| Risk | CRITICAL |
What it tests: ROWS FROM(f1(), f2(), ...) rewriting: multi-unnest merge,
mixed SRF handling.
Assessment: The weakest test file in the entire suite. All 6 tests verify only row counts. If the rewriting is broken and produces NULL values in all columns, tests still pass (correct count, wrong values). No error paths.
Mitigations (CRITICAL PRIORITY):
1. Add assert_st_matches_query() to all 6 tests.
2. Add specific value assertions for NULL pairing behavior.
3. Add error-path test for invalid SRF combinations.
e2e_snapshot_consistency_tests.rs
| Metric | Value |
|---|---|
| Tests | 5 |
| Primary assertion | assert_st_matches_query + cross-source invariants |
assert_st_matches_query |
Most tests |
| Error paths | ZERO |
| Risk | LOW |
What it tests: Cross-source snapshot consistency: overlapping sources, diamond convergence, interleaved mutations.
Assessment: Very strong. Adds cross-source invariant checks on top of
standard assert_st_matches_query().
e2e_watermark_gating_tests.rs
| Metric | Value |
|---|---|
| Tests | 26 (23 light + 3 full-only) |
| Primary assertion | Mixed (CRUD, status, assert_st_matches_query) |
assert_st_matches_query |
Some |
| Error paths | YES (rejections) |
| Risk | LOW-MEDIUM |
What it tests: Watermark advancement, monotonicity, groups, tolerance, alignment detection, ST gating. Scheduler tests gated to full E2E only.
Assessment: Good coverage of watermark API. Light-mode tests cover CRUD and gating. Scheduler-level verification only in full E2E (acceptable given light-E2E limitations).
Mitigations: 1. Add boundary test for tolerance (exactly at tolerance limit).
Priority Mitigations
P0 — Critical (should block next release)
| # | Action | Impact | Effort |
|---|---|---|---|
| 1 | tests/e2e/light.rs assert_st_matches_query to use EXCEPT ALL |
DONE | Small |
| 2 | assert_st_matches_query to rows_from_tests (6 tests) |
DONE | Small |
| 3 | assert_st_matches_query to expression_tests (41 tests) |
DONE | Medium |
| 4 | assert_st_matches_query to ivm_tests (26 tests) |
DONE | Medium |
P1 — High (should address soon)
| # | Action | Impact | Effort |
|---|---|---|---|
| 5 | assert_st_matches_query to getting_started_tests |
DONE | Small |
| 6 | assert_st_matches_query to lateral_subquery_tests |
DONE | Small |
| 7 | assert_st_matches_query to topk_tests (10 most critical) |
DONE | Medium |
| 8 | create_or_replace_tests |
DONE | Small |
| 9 | error_tests |
DONE | Medium |
P2 — Medium (address during regular maintenance)
| # | Action | Impact | Effort |
|---|---|---|---|
| 10 | monitoring_tests with value validation |
DONE | Medium |
| 11 | assert_st_matches_query to create_tests (5 complex cases) |
DONE | Small |
| 12 | assert_st_matches_query to alter_tests post-alter |
DONE | Small |
| 13 | consecutive_errors in dag_error_tests |
DONE | Small |
| 14 | assert_st_matches_query to concurrent_tests post-op |
DONE | Small |
| 15 | view_tests |
DONE | Small |
P3 — Low (backlog)
| # | Action | Impact | Effort |
|---|---|---|---|
| 16 | |
DONE | Medium |
| 17 | |
DONE | Medium |
| 18 | |
DONE | Small |
| 19 | |
DONE | Small |
| 20 | |
DONE | Small |
Appendix: Summary Table
| File | Tests | Assertion Quality | assert_st_matches_query |
Error Paths | Risk |
|---|---|---|---|---|---|
| smoke | ~5 | Infrastructure | 0% | Some | LOW |
| create | ~30 | Catalog only | 0% | Some | MEDIUM |
| drop | ~20 | Cleanup checks | N/A | Some | LOW |
| alter | ~15 | Catalog only | 0% | Some | MEDIUM |
| create_or_replace | ~10 | No-op checks | 0% | ZERO | MED-HIGH |
| error | ~25 | is_err() |
0% | ALL | MEDIUM |
| lifecycle | ~15 | Status checks | ~7% | Some | MEDIUM |
| refresh | ~30 | Excellent | ~53% | Some | LOW |
| cdc | ~15 | Buffer checks | Partial | ZERO | MEDIUM |
| stmt_cdc | ~15 | Bit-level | Partial | ZERO | LOW |
| expression | ~41 | Counts only | 0% | ZERO | HIGH |
| property | ~15 | Custom EXCEPT ALL | Custom 100% | ZERO | LOW |
| cte | ~71 | Excellent | Extensive | ZERO | LOW |
| ivm | ~26 | Counts only | 0% | ZERO | HIGH |
| concurrent | ~8 | Smoke only | 0% | Some | MEDIUM |
| coverage_error | ~15 | is_err() |
0% | ALL | MEDIUM |
| coverage_parser | ~20 | Parse checks | 0% | Some | LOW |
| diamond | ~12 | Partial | Partial | ZERO | MEDIUM |
| dag_operations | ~15 | Excellent | Extensive | ZERO | LOW |
| dag_topology | ~10 | Ordering | Some | ZERO | LOW |
| dag_error | ~10 | Status checks | 0% | Partial | MEDIUM |
| dag_concurrent | ~8 | Retry-loop | 0% | Some | MEDIUM |
| dag_immediate | ~15 | Immediate checks | Some | ZERO | LOW |
| keyless_duplicate | 8 | Excellent | 88% | ZERO | MEDIUM* |
| lateral | 16 | Mixed | 56% | ZERO | MEDIUM |
| lateral_subquery | 13 | Manual only | 0% | ZERO | HIGH |
| full_join | 6 | Excellent | 100% | ZERO | LOW |
| window | 24 | Mixed | 17% | Light | MEDIUM |
| multi_window | 7 | Excellent | 100% | ZERO | LOW |
| set_operation | 7 | Excellent | 100% | ZERO | MEDIUM* |
| topk | ~60 | Counts/min/max | 0% | YES | MEDIUM |
| scalar_subquery | 4 | Excellent | 100% | ZERO | LOW-MED |
| sublink_or | 4 | Excellent | 100% | ZERO | LOW-MED |
| all_subquery | 10 | Excellent | 70% | ZERO | LOW |
| aggregate_coverage | 18 | Gold standard | 100% | ZERO | LOW |
| having_transition | 7 | Excellent | 100% | ZERO | LOW |
| view | 16 | Mixed | ~50% | YES | MEDIUM |
| monitoring | 6 | Existence only | 0% | ZERO | MED-HIGH |
| getting_started | 8 | Hardcoded values | 0% | ZERO | HIGH |
| guard_trigger | 5 | Strong | 20% | YES (4/5) | LOW |
| phase4_ergonomics | ~20 | Mixed | 0% | Several | MEDIUM |
| pipeline_dag | 18 | Exceptional | Most | ZERO | LOW |
| mixed_mode_dag | 5 | Excellent | 100% | ZERO | LOW |
| multi_cycle_dag | 6 | Gold standard | 60+ calls | ZERO | LOW |
| rows_from | 6 | Counts only | 0% | ZERO | CRITICAL |
| snapshot_consistency | 5 | Excellent | Most | ZERO | LOW |
| watermark_gating | 26 | Mixed | Some | YES | LOW-MED |
* keyless_duplicate and set_operation are marked MEDIUM because the light.rs
EXCEPT (vs EXCEPT ALL) bug means their multiset assertions are weaker than
intended.
End of report.