pg_trickle — Testing & Coverage Status

What we’ve learned about testing the pg_trickle extension, the current state of tests and coverage, and what can be improved.


1. Current Test Inventory

Category Tests Runner Requires Docker?
Unit tests (in src/) 1194 just test-unit No
Integration tests (in tests/) 101 just test-integration Yes (bare PG 18.x containers)
E2E tests (in tests/e2e_*.rs) 340 just test-e2e Yes (custom Docker image with extension)
Benchmarks 16 just test-e2e (ignored) Yes
Total 1,651 just test-all

Unit Tests by Module

File Tests Focus
src/dvm/parser.rs 237 SQL parsing, OpTree construction, expression handling, view definition sanitization
src/dvm/operators/aggregate.rs 106 Aggregate differentiation, merge expressions, delta CTEs
src/api.rs 82 Pure helper functions (quote_identifier, find_top_level_keyword, classify_relkind, etc.)
src/dvm/operators/recursive_cte.rs 59 Recursive CTE SQL generation, cascade detection
src/refresh.rs 44 Refresh action resolution, LSN placeholder substitution
src/dvm/diff.rs 34 DiffContext, CTE name generation, dispatch
src/dvm/mod.rs 28 Delta template resolution, union splitting, scan-chain detection
src/dag.rs 26 Topological sort, cycle detection, lag resolution
src/dvm/operators/join.rs 20 Inner join differentiation, equijoin key extraction
src/dvm/operators/scan.rs 18 Scan differentiation, PK detection, hash expressions
src/dvm/operators/lateral_subquery.rs 18 Lateral subquery support
src/dvm/operators/join_common.rs 18 Shared join helpers
src/dvm/operators/lateral_function.rs 16 Lateral function differentiation
src/dvm/operators/except.rs 16 EXCEPT operator differentiation
src/version.rs 15 Versioning, epoch conversion, frontier operations
src/dvm/operators/intersect.rs 14 INTERSECT operator differentiation
Others (19 files) 128 filter, distinct, union_all, window, project, outer_join, cte_scan, subquery, error, hash, cdc, monitor, scheduler, hooks, semi_join, anti_join, full_join, scalar_subquery

E2E Tests by File

File Tests Focus
e2e_cte_tests.rs 68 Recursive/non-recursive CTE parsing and IVM
e2e_expression_tests.rs 29 Expression handling in defining queries
e2e_create_tests.rs 29 create_stream_table() validation, catalog writes
e2e_refresh_tests.rs 26 Refresh modes (FULL, DIFFERENTIAL), staleness, correctness
e2e_error_tests.rs 26 Error handling, invalid inputs, rejection messages
e2e_window_tests.rs 18 Window function parsing and IVM
e2e_lateral_tests.rs 16 LATERAL joins
e2e_bench_tests.rs 16 Performance benchmarks (marked #[ignore])
e2e_lateral_subquery_tests.rs 12 LATERAL subquery patterns
e2e_property_tests.rs 11 Deterministic property-based correctness
e2e_coverage_parser_tests.rs 11 Parser edge cases for coverage
e2e_smoke_tests.rs 10 Basic extension functionality
e2e_coverage_error_tests.rs 10 Error path coverage
e2e_alter_tests.rs 10 ALTER operations
e2e_lifecycle_tests.rs 9 Stream table lifecycle state machine
e2e_cdc_tests.rs 9 CDC triggers, change buffers
e2e_bgworker_tests.rs 9 Background worker scheduling
e2e_drop_tests.rs 8 Drop cascades, cleanup
e2e_monitoring_tests.rs 6 Stats, lag, alerts
e2e_ddl_event_tests.rs 4 DDL event hook handling
e2e_concurrent_tests.rs 3 Advisory locks, concurrent refresh

Integration Tests (Non-E2E)

These run against bare PostgreSQL 18.x containers without the extension loaded, manually creating catalog schemas:

File Tests Focus
catalog_tests.rs 15 Catalog CRUD via SQL
catalog_compat_tests.rs 13 PG18 catalog behavior: pg_get_viewdef semicolons, nspname::text cast, relkind values, array_length types, UNION ALL column consistency
scenario_tests.rs 10 Multi-step workflow scenarios
monitoring_tests.rs 8 Monitoring views and functions
resilience_tests.rs 7 Error recovery, edge cases
extension_tests.rs 7 Extension install/uninstall
workflow_tests.rs 4 End-to-end workflows
smoke_tests.rs 3 Basic smoke tests

2. Testing Architecture

Three-Tier Test Strategy

┌─────────────────────────────────────────────────────┐
│  Tier 1: Unit Tests (854 tests)                     │
│  - Pure Rust, no database needed                    │
│  - Tests OpTree construction, SQL generation,       │
│    differentiation logic, DAG algorithms            │
│  - Runs in <5 seconds                               │
│  - cargo test --lib                                 │
├─────────────────────────────────────────────────────┤
│  Tier 2: Integration Tests (101 tests)              │
│  - Bare PG 18.x containers via Testcontainers       │
│  - Tests SQL-level data model, catalog schema,      │
│    and PG version-specific catalog behavior          │
│  - Does NOT load the compiled extension             │
│  - Runs in ~2-3 minutes                             │
├─────────────────────────────────────────────────────┤
│  Tier 3: E2E Tests (340 tests)                      │
│  - Custom Docker image with compiled extension      │
│  - Tests full SQL API via CREATE EXTENSION          │
│  - Exercises CDC triggers, event hooks, refresh,    │
│    background workers, monitoring views             │
│  - Runs in ~10-15 minutes (serial, one container    │
│    per test)                                        │
└─────────────────────────────────────────────────────┘

E2E Infrastructure

The E2E tests use a custom Docker image (pg_trickle_e2e:latest) built via a multi-stage Dockerfile:

  1. Stage 1 (builder): Installs Rust, cargo-pgrx, compiles the extension against PG 18 headers
  2. Stage 2 (runtime): Copies .so, .control, .sql into a clean postgres:18.x image with shared_preload_libraries = 'pg_trickle'

Build the image: bash ./tests/build_e2e_image.sh

The E2eDb test harness (in tests/e2e/mod.rs) starts one container per test via testcontainers::GenericImage, connects via sqlx::PgPool, and provides helpers for extension setup, SQL execution, and assertions.

Property-Based Testing

The e2e_property_tests.rs file implements 11 deterministic property-based correctness tests. The key invariant tested:

For every stream table, at every data timestamp: Contents(ST) = Result(defining_query) (multiset equality)

Each test uses a deterministic SplitMix64 PRNG seeded per test, applies randomized DML (INSERT/UPDATE/DELETE mix) across 5 cycles of 15 initial rows, and verifies the invariant after each cycle. On failure, the seed is printed for reproduction.


3. Code Coverage

Coverage History

Milestone Unit Tests Lines Covered Coverage
Initial baseline 189 2,888 / 9,139 31.6%
After Coverage Wave 1 (4 phases) 476 7,525 / 12,117 62.1%
Current (post SQL support expansion) 854 ~8,500+ / ~13,500 ~63% (est.)

The Coverage Ceiling

Unit test coverage has hit a structural ceiling at ~63%. Of the ~4,600 uncovered lines:

Category Uncovered Lines % of Uncovered Unit Testable?
DB-only (SPI, pg_sys FFI, GUC, shmem, bg workers) ~3,940 86% No
Parser (pg_sys::raw_parser tree walking) ~1,012 22% No
Pure Rust edge branches ~165 4% Yes (diminishing returns)

86% of remaining uncovered code requires a running PostgreSQL instance. It uses Spi::connect(), pg_sys::raw_parser(), BackgroundWorkerBuilder, GUC registration, event trigger context, or #[pg_extern] entry points.

DB-Only Code That Cannot Be Unit Tested

File Uncovered Lines Why
src/dvm/parser.rs ~1,012 pg_sys::raw_parser() FFI parse tree walking
src/api.rs ~662 #[pg_extern] + SPI orchestration
src/catalog.rs ~464 100% SPI queries
src/scheduler.rs ~380 Background worker + SPI
src/monitor.rs ~358 #[pg_extern] + SPI
src/refresh.rs ~315 SPI (pure logic already tested)
src/hooks.rs ~275 Event triggers + SPI
src/cdc.rs ~224 SPI CREATE/DROP + triggers
src/dvm/mod.rs ~181 Calls parse_defining_query()
Others ~121 GUC, shmem, pg_extern wrappers

E2E Coverage Infrastructure (Implemented)

To break through the 63% ceiling, an instrumented E2E coverage pipeline has been built:

  1. tests/Dockerfile.e2e-coverage — Builds the extension with LLVM coverage instrumentation (-C instrument-coverage)
  2. scripts/e2e-coverage.sh — Full pipeline: build instrumented image → run E2E tests → extract profraw → merge with unit coverage → generate report
  3. tests/e2e/mod.rs — Supports PGS_E2E_IMAGE env var for image override and PGS_E2E_COVERAGE_DIR for volume-mounting profraw output

Projected combined coverage (unit + E2E): 75–85%.

Running Coverage

# Unit test coverage (HTML + LCOV)
just coverage

# Terminal summary only
just coverage-text

# E2E instrumented coverage (full pipeline)
just coverage-e2e

# E2E coverage, skip Docker rebuild
just coverage-e2e-fast

4. Key Lessons Learned

4.1 The IVM Engine Is Highly Unit-Testable

The biggest win was discovering that the entire DVM (differential view maintenance) subsystem — 11 operator files, the diff engine, and the parser’s pure logic — is fully testable without a database. These files do string-based SQL construction from OpTree structs. Going from 0% to 90%+ coverage on the operator files required only constructing test OpTrees and asserting on the generated SQL.

Pattern used across all operator tests: rust let mut ctx = test_ctx(); // DiffContext with dummy frontiers let tree = OpTree::Scan { /* ... */ }; let result = diff_scan(&mut ctx, &tree).unwrap(); assert!(result.cte_sql.contains("expected SQL fragment"));

4.2 Property Tests Found Real Bugs

The 11 deterministic property tests in e2e_property_tests.rs caught a correctness bug in the DELETE+INSERT merge strategy. When the strategy evaluated the delta query twice (once for DELETE, once for INSERT), aggregate/DISTINCT queries that LEFT JOIN back to the stream table to read __pgt_count saw modified data between the DELETE and INSERT, causing incorrect results.

This was not caught by hand-written E2E tests because they used known data patterns. The randomized DML from property tests triggered the edge case.

Lesson: Property-based tests with randomized DML are essential for IVM correctness, not just a nice-to-have.

4.3 E2E Test Isolation Is Critical

Each E2E test gets its own PostgreSQL container. Early experiments with shared containers led to test interference (leftover stream tables, triggers, catalog state). The overhead of per-test containers (~1-2 seconds startup) is worth the isolation guarantee.

Trade-off: Running 340 E2E tests serially with per-test containers takes ~10-15 minutes. Parallelism is possible but requires sufficient Docker resources.

4.4 The Docker Image Build Is the Bottleneck

Building the E2E Docker image (compiling Rust + pgrx inside a container) takes 5-15 minutes depending on cache state. This is the dominant cost in the development cycle. Mitigations:

  • Docker layer caching (separate cargo fetch layer before source copy)
  • just test-e2e-fast skips the rebuild when iterating on tests
  • CI caches the builder layer

4.5 Integration Tests (Tier 2) Are Valuable for Catalog Assumptions

The integration tests that run against bare PostgreSQL (without the extension) were initially viewed as marginal once E2E tests existed. However, the catalog compatibility tests (catalog_compat_tests.rs) proved their value by catching PostgreSQL version-specific behavior differences (see §4.7). These tests run fast (~40s), don’t require the extension Docker image, and serve as early-warning canaries for PG version upgrades.

The original 54 tests that simulate catalog behavior by manually creating schemas still have overlap with E2E tests, but the newer catalog compatibility tests fill a unique niche.

4.6 Benchmark Tests Need Careful Interpretation

Benchmark results show high variance between runs (see STATUS_PERFORMANCE.md). Cycle 1 is always cold (cache miss, first plan). The “INCR c1” and “INCR 2+” columns in later benchmark snapshots are more informative than simple averages.

4.7 E2E Failures Exposed Systemic Gaps (Session 2026-02)

A full E2E test run uncovered ~86 failures across multiple categories. All were fixable, but the root causes reveal important lessons about what unit tests and integration tests cannot catch in a pgrx extension:

Root Cause 1: pgrx Type Oid Mismatch (9 tests)

resolve_rangevar_schema() queried pg_namespace.nspname (PostgreSQL name type, Oid 19) and decoded it via .get::<String>() which expects text (Oid 25). This worked in earlier pgrx versions but broke with stricter type checking.

Lesson: Always cast name-typed columns to ::text in SPI queries. This is now documented as a mandatory coding convention in AGENTS.md. A dedicated integration test (test_nspname_text_cast_works) now verifies this assumption.

Root Cause 2: CTE Names Aren’t in pg_class (63 tests)

resolve_rangevar_schema() assumed every RangeVar in a FROM clause references a real pg_class entry. CTE names (e.g., WITH all_people AS ... FROM all_people) aren’t in pg_class, causing an empty SpiTupleTable iterator panic.

Lesson: Catalog lookups must return Option rather than panicking on empty results. CTE names, subquery aliases, and function-call ranges are valid FROM sources that don’t exist in pg_class. The fix changed the return type from Result<String> to Result<Option<String>> and callers now skip None values.

Root Cause 3: PostgreSQL 18 Behavior Change (13 tests)

pg_get_viewdef() in PostgreSQL 18 returns view definitions with a trailing semicolon. When embedded as (SELECT ... ;) inside a subquery, this produces a syntax error. This is a pure PG version regression invisible to unit tests.

Lesson: Any function that wraps catalog output in SQL (e.g., embedding a view definition in a subquery) must sanitize the output. The fix was extracted as a pure function strip_view_definition_suffix() with 6 unit tests. An integration test (test_pg_get_viewdef_returns_trimmed_definition) now verifies this behavior on the actual PG version.

Root Cause 4: Stale Tests After Feature Implementation (7 tests)

Tests that expected rejection of NATURAL JOIN, DISTINCT ON, and GROUPING SETS/CUBE/ROLLUP were not updated when these features were implemented as auto-rewrites in a prior session.

Lesson: When implementing a feature that changes error→success behavior, the corresponding rejection tests must be updated in the same commit. A pre-commit checklist item has been added to AGENTS.md.

Root Cause 5: Incomplete relkind Handling (1 test)

classify_source_relation() mapped materialized views (relkind='m') to "TABLE", causing CDC trigger installation attempts that fail on matviews.

Lesson: The relkind → source-type mapping was interleaved with SPI calls, making it impossible to unit-test. Extracting the pure function classify_relkind() enabled 7 unit tests covering all relkind variants. An integration test (test_relkind_for_materialized_view) verifies the actual PG relkind value.

Root Cause 6: INT4/INT8 Type Mismatch (1 test)

array_length() returns integer (INT4), but a test expected bigint (INT8) when comparing columns. An integration test (test_array_length_returns_int4) now documents this.

Impact and Prevention

Category Tests Fixed Prevention Measure
Oid 19/25 type mismatch 9 ::text cast convention in AGENTS.md + integration test
CTE name in pg_class 63 Option return pattern + integration test
PG18 viewdef semicolons 13 strip_view_definition_suffix() pure function + 6 unit tests + integration test
Stale rejection tests 7 Pre-commit checklist in AGENTS.md
Matview relkind handling 1 classify_relkind() pure function + 7 unit tests + integration test
INT4/INT8 type mismatch 1 Integration test documenting array_length return type
Total ~86 (reduced to 4 pre-existing) 13 new integration tests + 13 new unit tests

The 4 remaining failures are pre-existing (CDC partition data mismatch, CTE multi-ref self-join, recursive CTE row_data column, CASE-in-WHERE deparsing) and are tracked separately.


5. What Can Be Improved

5.1 CI Integration for E2E Coverage (Not Started)

Priority: High. The instrumented E2E coverage pipeline (scripts/e2e-coverage.sh) works locally but is not integrated into CI. A GitHub Actions workflow should:

  • Build the coverage-instrumented Docker image
  • Run E2E tests with profraw extraction
  • Merge unit + E2E profdata
  • Upload combined coverage to Codecov
  • Track the combined metric on every PR

This would provide visibility into the 75-85% combined coverage on every commit, gating PRs that reduce coverage.

5.2 More Property Tests

Priority: High. Currently 11 property tests cover: SUM, COUNT, AVG, DISTINCT, MIN/MAX, filter, join+aggregate, scan, multi-aggregate, string_agg, and array_agg. Missing coverage:

Missing Scenario Why It Matters
Window functions Complex SQL generation, PARTITION BY + ORDER BY
CTEs (recursive and non-recursive) Multi-step differentiation
LATERAL joins Recently added, complex delta logic
EXCEPT / INTERSECT Set operations with count tracking
Multi-source joins (3+ tables) Bilinear join rule with >2 inputs
HAVING clause Post-aggregation filtering
Composite primary keys Hash computation differences

Each new property test would exercise the full pipeline: CDC trigger → change buffer → delta query generation → MERGE → correctness verification.

5.3 Schema Evolution E2E Tests

Priority: Medium. The e2e_ddl_event_tests.rs file has only 4 tests. Known untested scenarios:

  • ADD COLUMN on a monitored source table → trigger rebuild
  • DROP COLUMN on a monitored source table → error handling
  • ALTER COLUMN TYPE on a source → type mismatch detection
  • RENAME TABLE of a source → OID tracking
  • CREATE INDEX on a source → no-op verification
  • DROP source TABLE with multiple downstream stream tables

5.4 Concurrent Refresh Stress Tests

Priority: Medium. Only 3 tests in e2e_concurrent_tests.rs. Could add:

  • Multiple stream tables on the same source, refreshed concurrently
  • Concurrent DML on the source while refresh is running
  • Advisory lock timeout and retry behavior under contention
  • Background worker scheduling with overlapping refresh windows

5.5 Error Recovery Tests

Priority: Medium. The e2e_error_tests.rs file focuses on input validation errors. Missing:

  • Refresh failure mid-MERGE (simulate disk full / connection drop)
  • Background worker crash and restart
  • Extension upgrade (ALTER EXTENSION pg_trickle UPDATE)
  • Out-of-memory during large delta materialization
  • Transaction abort during create_stream_table() → verify cleanup

5.6 Write-Side Overhead Benchmarks

Priority: Low. A benchmark plan exists in PLAN_TRIGGERS_OVERHEAD.md to measure CDC trigger impact on DML throughput (INSERT/UPDATE/DELETE ops/sec with and without triggers). This would complement the existing refresh benchmarks by quantifying the write-side cost of monitoring a source table. Not yet implemented.

5.7 Expand Integration Tests as Version-Compatibility Canaries

Priority: Medium (upgraded from Low). The catalog_compat_tests.rs file (13 tests) demonstrated that integration tests have a unique role: they run without building the extension Docker image (~40s vs ~5-15 min) and catch PostgreSQL version-specific behavior changes early.

Candidates for new catalog compatibility tests:

Test What It Catches
pg_get_viewdef for complex views (CTEs, LATERAL, window functions) Semicolon/formatting changes per PG version
pg_catalog.pg_proc volatility column type Type changes in system catalogs
current_setting('search_path') parsing with $user search_path edge cases
relkind for partitioned indexes (relkind='I') New relkind values in PG18+
information_schema vs pg_catalog type name differences ORM-like assumptions

5.8 Systematic “Pure Function Extraction” for SPI Code

Priority: Medium. The E2E failure investigation (§4.7) showed that bugs in SPI-dependent code are invisible to unit tests. The pattern that works:

  1. Extract pure decision logic from SPI-calling functions into standalone functions
  2. Unit-test the pure function exhaustively (all branches)
  3. Keep the SPI glue thin and tested only via E2E

Examples already done: - classify_relkind(relkind: &str) -> String — extracted from classify_source_relation() - strip_view_definition_suffix(raw: &str) -> String — extracted from get_view_definition()

Candidates for future extraction:

SPI Function Extractable Logic
resolve_rangevar_schema() Schema selection from search_path ordering
check_source_volatility() Volatility classification rules
build_change_buffer_table_sql() DDL SQL construction from column metadata
compute_refresh_action() Decision tree for FULL vs DIFFERENTIAL vs SKIP

5.9 Fuzz Testing for the Parser

Priority: Low. The SQL parser (src/dvm/parser.rs, ~13,000 lines) handles arbitrary SQL from pg_sys::raw_parser(). Fuzz testing with cargo-fuzz or proptest could uncover:

  • Panics on unexpected parse tree shapes
  • Incorrect SQL generation for exotic syntax
  • Memory safety issues in unsafe FFI boundaries

This requires a running PostgreSQL instance (for pg_sys::raw_parser()), so it would need to be done as an E2E fuzzing harness.


6. Test Commands Reference

# ── Unit tests (fast, no Docker) ──
just test-unit                    # 854 tests, ~5 seconds

# ── Integration tests (bare PG containers) ──
just test-integration             # 101 tests, ~3-4 minutes

# ── E2E tests (requires Docker image) ──
just build-e2e-image              # Build Docker image (~5-15 min first time)
just test-e2e                     # 340 tests, ~10-15 min (rebuilds image)
just test-e2e-fast                # Skip image rebuild

# ── All tests ──
just test-all                     # Unit + integration + E2E + pgrx

# ── Benchmarks ──
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_full_matrix

# ── Coverage ──
just coverage                     # Unit coverage (HTML + LCOV)
just coverage-text                # Unit coverage (terminal)
just coverage-e2e                 # E2E instrumented coverage (full pipeline)
just coverage-e2e-fast            # Skip Docker image rebuild

# ── Lint ──
just fmt                          # Format code
just lint                         # Format check + clippy

7. Summary

Aspect Status Notes
Unit tests 854 tests, ~63% coverage Structural ceiling — 86% of remaining uncovered lines need a DB
E2E tests 340 tests Comprehensive, exercising full SQL API (4 pre-existing failures tracked)
Integration tests 101 tests Now includes 13 catalog compatibility canaries for PG version behavior
Property tests 11 tests Found real correctness bugs; need expansion
Benchmarks 16 benchmarks Refresh performance matrix, trigger overhead planned
E2E coverage infra Built, not in CI Projected 75-85% combined coverage
CI coverage tracking Not started Highest-priority improvement
Pure function extraction In progress classify_relkind, strip_view_definition_suffix extracted; more candidates identified
Fuzz testing Not started Parser is the primary target

The testing strategy is mature for an extension at this stage. The main gaps are: (1) CI integration for combined E2E coverage, (2) more property tests for recently added SQL features, (3) schema evolution and concurrent stress tests, and (4) systematic extraction of pure functions from SPI-dependent code to bring more logic under unit test coverage.