pg_trickle — Testing & Coverage Status

pg_trickle — Testing & Coverage Status

What we’ve learned about testing the pg_trickle extension, the current state of tests and coverage, and what can be improved.

1. Current Test Inventory

Category	Tests	Runner	Requires Docker?
Unit tests (in `src/`)	1194	`just test-unit`	No
Integration tests (in `tests/`)	101	`just test-integration`	Yes (bare PG 18.x containers)
E2E tests (in `tests/e2e_*.rs`)	340	`just test-e2e`	Yes (custom Docker image with extension)
Benchmarks	16	`just test-e2e` (ignored)	Yes
Total	1,651	`just test-all`

Unit Tests by Module

File	Tests	Focus
`src/dvm/parser.rs`	237	SQL parsing, OpTree construction, expression handling, view definition sanitization
`src/dvm/operators/aggregate.rs`	106	Aggregate differentiation, merge expressions, delta CTEs
`src/api.rs`	82	Pure helper functions (`quote_identifier`, `find_top_level_keyword`, `classify_relkind`, etc.)
`src/dvm/operators/recursive_cte.rs`	59	Recursive CTE SQL generation, cascade detection
`src/refresh.rs`	44	Refresh action resolution, LSN placeholder substitution
`src/dvm/diff.rs`	34	DiffContext, CTE name generation, dispatch
`src/dvm/mod.rs`	28	Delta template resolution, union splitting, scan-chain detection
`src/dag.rs`	26	Topological sort, cycle detection, lag resolution
`src/dvm/operators/join.rs`	20	Inner join differentiation, equijoin key extraction
`src/dvm/operators/scan.rs`	18	Scan differentiation, PK detection, hash expressions
`src/dvm/operators/lateral_subquery.rs`	18	Lateral subquery support
`src/dvm/operators/join_common.rs`	18	Shared join helpers
`src/dvm/operators/lateral_function.rs`	16	Lateral function differentiation
`src/dvm/operators/except.rs`	16	EXCEPT operator differentiation
`src/version.rs`	15	Versioning, epoch conversion, frontier operations
`src/dvm/operators/intersect.rs`	14	INTERSECT operator differentiation
Others (19 files)	128	filter, distinct, union_all, window, project, outer_join, cte_scan, subquery, error, hash, cdc, monitor, scheduler, hooks, semi_join, anti_join, full_join, scalar_subquery

E2E Tests by File

File	Tests	Focus
`e2e_cte_tests.rs`	68	Recursive/non-recursive CTE parsing and IVM
`e2e_expression_tests.rs`	29	Expression handling in defining queries
`e2e_create_tests.rs`	29	`create_stream_table()` validation, catalog writes
`e2e_refresh_tests.rs`	26	Refresh modes (FULL, DIFFERENTIAL), staleness, correctness
`e2e_error_tests.rs`	26	Error handling, invalid inputs, rejection messages
`e2e_window_tests.rs`	18	Window function parsing and IVM
`e2e_lateral_tests.rs`	16	LATERAL joins
`e2e_bench_tests.rs`	16	Performance benchmarks (marked `#[ignore]`)
`e2e_lateral_subquery_tests.rs`	12	LATERAL subquery patterns
`e2e_property_tests.rs`	11	Deterministic property-based correctness
`e2e_coverage_parser_tests.rs`	11	Parser edge cases for coverage
`e2e_smoke_tests.rs`	10	Basic extension functionality
`e2e_coverage_error_tests.rs`	10	Error path coverage
`e2e_alter_tests.rs`	10	ALTER operations
`e2e_lifecycle_tests.rs`	9	Stream table lifecycle state machine
`e2e_cdc_tests.rs`	9	CDC triggers, change buffers
`e2e_bgworker_tests.rs`	9	Background worker scheduling
`e2e_drop_tests.rs`	8	Drop cascades, cleanup
`e2e_monitoring_tests.rs`	6	Stats, lag, alerts
`e2e_ddl_event_tests.rs`	4	DDL event hook handling
`e2e_concurrent_tests.rs`	3	Advisory locks, concurrent refresh

Integration Tests (Non-E2E)

These run against bare PostgreSQL 18.x containers without the extension loaded, manually creating catalog schemas:

File	Tests	Focus
`catalog_tests.rs`	15	Catalog CRUD via SQL
`catalog_compat_tests.rs`	13	PG18 catalog behavior: `pg_get_viewdef` semicolons, `nspname::text` cast, `relkind` values, `array_length` types, UNION ALL column consistency
`scenario_tests.rs`	10	Multi-step workflow scenarios
`monitoring_tests.rs`	8	Monitoring views and functions
`resilience_tests.rs`	7	Error recovery, edge cases
`extension_tests.rs`	7	Extension install/uninstall
`workflow_tests.rs`	4	End-to-end workflows
`smoke_tests.rs`	3	Basic smoke tests

2. Testing Architecture

Three-Tier Test Strategy

┌─────────────────────────────────────────────────────┐
│  Tier 1: Unit Tests (854 tests)                     │
│  - Pure Rust, no database needed                    │
│  - Tests OpTree construction, SQL generation,       │
│    differentiation logic, DAG algorithms            │
│  - Runs in <5 seconds                               │
│  - cargo test --lib                                 │
├─────────────────────────────────────────────────────┤
│  Tier 2: Integration Tests (101 tests)              │
│  - Bare PG 18.x containers via Testcontainers       │
│  - Tests SQL-level data model, catalog schema,      │
│    and PG version-specific catalog behavior          │
│  - Does NOT load the compiled extension             │
│  - Runs in ~2-3 minutes                             │
├─────────────────────────────────────────────────────┤
│  Tier 3: E2E Tests (340 tests)                      │
│  - Custom Docker image with compiled extension      │
│  - Tests full SQL API via CREATE EXTENSION          │
│  - Exercises CDC triggers, event hooks, refresh,    │
│    background workers, monitoring views             │
│  - Runs in ~10-15 minutes (serial, one container    │
│    per test)                                        │
└─────────────────────────────────────────────────────┘

E2E Infrastructure

The E2E tests use a custom Docker image (pg_trickle_e2e:latest) built via a multi-stage Dockerfile:

Stage 1 (builder): Installs Rust, cargo-pgrx, compiles the extension against PG 18 headers
Stage 2 (runtime): Copies .so, .control, .sql into a clean postgres:18.x image with shared_preload_libraries = 'pg_trickle'

Build the image: bash ./tests/build_e2e_image.sh

The E2eDb test harness (in tests/e2e/mod.rs) starts one container per test via testcontainers::GenericImage, connects via sqlx::PgPool, and provides helpers for extension setup, SQL execution, and assertions.

Property-Based Testing

The e2e_property_tests.rs file implements 11 deterministic property-based correctness tests. The key invariant tested:

For every stream table, at every data timestamp: Contents(ST) = Result(defining_query) (multiset equality)

Each test uses a deterministic SplitMix64 PRNG seeded per test, applies randomized DML (INSERT/UPDATE/DELETE mix) across 5 cycles of 15 initial rows, and verifies the invariant after each cycle. On failure, the seed is printed for reproduction.

3. Code Coverage

Coverage History

Milestone	Unit Tests	Lines Covered	Coverage
Initial baseline	189	2,888 / 9,139	31.6%
After Coverage Wave 1 (4 phases)	476	7,525 / 12,117	62.1%
Current (post SQL support expansion)	854	~8,500+ / ~13,500	~63% (est.)

The Coverage Ceiling

Unit test coverage has hit a structural ceiling at ~63%. Of the ~4,600 uncovered lines:

Category	Uncovered Lines	% of Uncovered	Unit Testable?
DB-only (SPI, pg_sys FFI, GUC, shmem, bg workers)	~3,940	86%	No
Parser (pg_sys::raw_parser tree walking)	~1,012	22%	No
Pure Rust edge branches	~165	4%	Yes (diminishing returns)

86% of remaining uncovered code requires a running PostgreSQL instance. It uses Spi::connect(), pg_sys::raw_parser(), BackgroundWorkerBuilder, GUC registration, event trigger context, or #[pg_extern] entry points.

DB-Only Code That Cannot Be Unit Tested

File	Uncovered Lines	Why
`src/dvm/parser.rs`	~1,012	`pg_sys::raw_parser()` FFI parse tree walking
`src/api.rs`	~662	`#[pg_extern]` + SPI orchestration
`src/catalog.rs`	~464	100% SPI queries
`src/scheduler.rs`	~380	Background worker + SPI
`src/monitor.rs`	~358	`#[pg_extern]` + SPI
`src/refresh.rs`	~315	SPI (pure logic already tested)
`src/hooks.rs`	~275	Event triggers + SPI
`src/cdc.rs`	~224	SPI CREATE/DROP + triggers
`src/dvm/mod.rs`	~181	Calls `parse_defining_query()`
Others	~121	GUC, shmem, pg_extern wrappers

E2E Coverage Infrastructure (Implemented)

To break through the 63% ceiling, an instrumented E2E coverage pipeline has been built:

tests/Dockerfile.e2e-coverage — Builds the extension with LLVM coverage instrumentation (-C instrument-coverage)
scripts/e2e-coverage.sh — Full pipeline: build instrumented image → run E2E tests → extract profraw → merge with unit coverage → generate report
tests/e2e/mod.rs — Supports PGS_E2E_IMAGE env var for image override and PGS_E2E_COVERAGE_DIR for volume-mounting profraw output

Projected combined coverage (unit + E2E): 75–85%.

Running Coverage

# Unit test coverage (HTML + LCOV)
just coverage

# Terminal summary only
just coverage-text

# E2E instrumented coverage (full pipeline)
just coverage-e2e

# E2E coverage, skip Docker rebuild
just coverage-e2e-fast

4. Key Lessons Learned

4.1 The IVM Engine Is Highly Unit-Testable

The biggest win was discovering that the entire DVM (differential view maintenance) subsystem — 11 operator files, the diff engine, and the parser’s pure logic — is fully testable without a database. These files do string-based SQL construction from OpTree structs. Going from 0% to 90%+ coverage on the operator files required only constructing test OpTrees and asserting on the generated SQL.

Pattern used across all operator tests: rust let mut ctx = test_ctx(); // DiffContext with dummy frontiers let tree = OpTree::Scan { /* ... */ }; let result = diff_scan(&mut ctx, &tree).unwrap(); assert!(result.cte_sql.contains("expected SQL fragment"));

4.2 Property Tests Found Real Bugs

The 11 deterministic property tests in e2e_property_tests.rs caught a correctness bug in the DELETE+INSERT merge strategy. When the strategy evaluated the delta query twice (once for DELETE, once for INSERT), aggregate/DISTINCT queries that LEFT JOIN back to the stream table to read __pgt_count saw modified data between the DELETE and INSERT, causing incorrect results.

This was not caught by hand-written E2E tests because they used known data patterns. The randomized DML from property tests triggered the edge case.

Lesson: Property-based tests with randomized DML are essential for IVM correctness, not just a nice-to-have.

4.3 E2E Test Isolation Is Critical

Each E2E test gets its own PostgreSQL container. Early experiments with shared containers led to test interference (leftover stream tables, triggers, catalog state). The overhead of per-test containers (~1-2 seconds startup) is worth the isolation guarantee.

Trade-off: Running 340 E2E tests serially with per-test containers takes ~10-15 minutes. Parallelism is possible but requires sufficient Docker resources.

4.4 The Docker Image Build Is the Bottleneck

Building the E2E Docker image (compiling Rust + pgrx inside a container) takes 5-15 minutes depending on cache state. This is the dominant cost in the development cycle. Mitigations:

Docker layer caching (separate cargo fetch layer before source copy)
just test-e2e-fast skips the rebuild when iterating on tests
CI caches the builder layer

4.5 Integration Tests (Tier 2) Are Valuable for Catalog Assumptions

The integration tests that run against bare PostgreSQL (without the extension) were initially viewed as marginal once E2E tests existed. However, the catalog compatibility tests (catalog_compat_tests.rs) proved their value by catching PostgreSQL version-specific behavior differences (see §4.7). These tests run fast (~40s), don’t require the extension Docker image, and serve as early-warning canaries for PG version upgrades.

The original 54 tests that simulate catalog behavior by manually creating schemas still have overlap with E2E tests, but the newer catalog compatibility tests fill a unique niche.

4.6 Benchmark Tests Need Careful Interpretation

Benchmark results show high variance between runs (see STATUS_PERFORMANCE.md). Cycle 1 is always cold (cache miss, first plan). The “INCR c1” and “INCR 2+” columns in later benchmark snapshots are more informative than simple averages.

4.7 E2E Failures Exposed Systemic Gaps (Session 2026-02)

A full E2E test run uncovered ~86 failures across multiple categories. All were fixable, but the root causes reveal important lessons about what unit tests and integration tests cannot catch in a pgrx extension:

Root Cause 1: pgrx Type Oid Mismatch (9 tests)

resolve_rangevar_schema() queried pg_namespace.nspname (PostgreSQL name type, Oid 19) and decoded it via .get::<String>() which expects text (Oid 25). This worked in earlier pgrx versions but broke with stricter type checking.

Lesson: Always cast name-typed columns to ::text in SPI queries. This is now documented as a mandatory coding convention in AGENTS.md. A dedicated integration test (test_nspname_text_cast_works) now verifies this assumption.

Root Cause 2: CTE Names Aren’t in pg_class (63 tests)

resolve_rangevar_schema() assumed every RangeVar in a FROM clause references a real pg_class entry. CTE names (e.g., WITH all_people AS ... FROM all_people) aren’t in pg_class, causing an empty SpiTupleTable iterator panic.

Lesson: Catalog lookups must return Option rather than panicking on empty results. CTE names, subquery aliases, and function-call ranges are valid FROM sources that don’t exist in pg_class. The fix changed the return type from Result<String> to Result<Option<String>> and callers now skip None values.

Root Cause 3: PostgreSQL 18 Behavior Change (13 tests)

pg_get_viewdef() in PostgreSQL 18 returns view definitions with a trailing semicolon. When embedded as (SELECT ... ;) inside a subquery, this produces a syntax error. This is a pure PG version regression invisible to unit tests.

Lesson: Any function that wraps catalog output in SQL (e.g., embedding a view definition in a subquery) must sanitize the output. The fix was extracted as a pure function strip_view_definition_suffix() with 6 unit tests. An integration test (test_pg_get_viewdef_returns_trimmed_definition) now verifies this behavior on the actual PG version.

Root Cause 4: Stale Tests After Feature Implementation (7 tests)

Tests that expected rejection of NATURAL JOIN, DISTINCT ON, and GROUPING SETS/CUBE/ROLLUP were not updated when these features were implemented as auto-rewrites in a prior session.

Lesson: When implementing a feature that changes error→success behavior, the corresponding rejection tests must be updated in the same commit. A pre-commit checklist item has been added to AGENTS.md.

Root Cause 5: Incomplete relkind Handling (1 test)

classify_source_relation() mapped materialized views (relkind='m') to "TABLE", causing CDC trigger installation attempts that fail on matviews.

Lesson: The relkind → source-type mapping was interleaved with SPI calls, making it impossible to unit-test. Extracting the pure function classify_relkind() enabled 7 unit tests covering all relkind variants. An integration test (test_relkind_for_materialized_view) verifies the actual PG relkind value.

Root Cause 6: INT4/INT8 Type Mismatch (1 test)

array_length() returns integer (INT4), but a test expected bigint (INT8) when comparing columns. An integration test (test_array_length_returns_int4) now documents this.

Impact and Prevention

Category	Tests Fixed	Prevention Measure
Oid 19/25 type mismatch	9	`::text` cast convention in AGENTS.md + integration test
CTE name in pg_class	63	`Option` return pattern + integration test
PG18 viewdef semicolons	13	`strip_view_definition_suffix()` pure function + 6 unit tests + integration test
Stale rejection tests	7	Pre-commit checklist in AGENTS.md
Matview relkind handling	1	`classify_relkind()` pure function + 7 unit tests + integration test
INT4/INT8 type mismatch	1	Integration test documenting `array_length` return type
Total	~86 (reduced to 4 pre-existing)	13 new integration tests + 13 new unit tests

The 4 remaining failures are pre-existing (CDC partition data mismatch, CTE multi-ref self-join, recursive CTE row_data column, CASE-in-WHERE deparsing) and are tracked separately.

5. What Can Be Improved

5.1 CI Integration for E2E Coverage (Not Started)

Priority: High. The instrumented E2E coverage pipeline (scripts/e2e-coverage.sh) works locally but is not integrated into CI. A GitHub Actions workflow should:

Build the coverage-instrumented Docker image
Run E2E tests with profraw extraction
Merge unit + E2E profdata
Upload combined coverage to Codecov
Track the combined metric on every PR

This would provide visibility into the 75-85% combined coverage on every commit, gating PRs that reduce coverage.

5.2 More Property Tests

Priority: High. Currently 11 property tests cover: SUM, COUNT, AVG, DISTINCT, MIN/MAX, filter, join+aggregate, scan, multi-aggregate, string_agg, and array_agg. Missing coverage:

Missing Scenario	Why It Matters
Window functions	Complex SQL generation, PARTITION BY + ORDER BY
CTEs (recursive and non-recursive)	Multi-step differentiation
LATERAL joins	Recently added, complex delta logic
EXCEPT / INTERSECT	Set operations with count tracking
Multi-source joins (3+ tables)	Bilinear join rule with >2 inputs
HAVING clause	Post-aggregation filtering
Composite primary keys	Hash computation differences

Each new property test would exercise the full pipeline: CDC trigger → change buffer → delta query generation → MERGE → correctness verification.

5.3 Schema Evolution E2E Tests

Priority: Medium. The e2e_ddl_event_tests.rs file has only 4 tests. Known untested scenarios:

ADD COLUMN on a monitored source table → trigger rebuild
DROP COLUMN on a monitored source table → error handling
ALTER COLUMN TYPE on a source → type mismatch detection
RENAME TABLE of a source → OID tracking
CREATE INDEX on a source → no-op verification
DROP source TABLE with multiple downstream stream tables

5.4 Concurrent Refresh Stress Tests

Priority: Medium. Only 3 tests in e2e_concurrent_tests.rs. Could add:

Multiple stream tables on the same source, refreshed concurrently
Concurrent DML on the source while refresh is running
Advisory lock timeout and retry behavior under contention
Background worker scheduling with overlapping refresh windows

5.5 Error Recovery Tests

Priority: Medium. The e2e_error_tests.rs file focuses on input validation errors. Missing:

Refresh failure mid-MERGE (simulate disk full / connection drop)
Background worker crash and restart
Extension upgrade (ALTER EXTENSION pg_trickle UPDATE)
Out-of-memory during large delta materialization
Transaction abort during create_stream_table() → verify cleanup

5.6 Write-Side Overhead Benchmarks

Priority: Low. A benchmark plan exists in PLAN_TRIGGERS_OVERHEAD.md to measure CDC trigger impact on DML throughput (INSERT/UPDATE/DELETE ops/sec with and without triggers). This would complement the existing refresh benchmarks by quantifying the write-side cost of monitoring a source table. Not yet implemented.

5.7 Expand Integration Tests as Version-Compatibility Canaries

Priority: Medium (upgraded from Low). The catalog_compat_tests.rs file (13 tests) demonstrated that integration tests have a unique role: they run without building the extension Docker image (~40s vs ~5-15 min) and catch PostgreSQL version-specific behavior changes early.

Candidates for new catalog compatibility tests:

Test	What It Catches
`pg_get_viewdef` for complex views (CTEs, LATERAL, window functions)	Semicolon/formatting changes per PG version
`pg_catalog.pg_proc` volatility column type	Type changes in system catalogs
`current_setting('search_path')` parsing with `$user`	search_path edge cases
`relkind` for partitioned indexes (`relkind='I'`)	New relkind values in PG18+
`information_schema` vs `pg_catalog` type name differences	ORM-like assumptions

5.8 Systematic “Pure Function Extraction” for SPI Code

Priority: Medium. The E2E failure investigation (§4.7) showed that bugs in SPI-dependent code are invisible to unit tests. The pattern that works:

Extract pure decision logic from SPI-calling functions into standalone functions
Unit-test the pure function exhaustively (all branches)
Keep the SPI glue thin and tested only via E2E

Examples already done: - classify_relkind(relkind: &str) -> String — extracted from classify_source_relation() - strip_view_definition_suffix(raw: &str) -> String — extracted from get_view_definition()

Candidates for future extraction:

SPI Function	Extractable Logic
`resolve_rangevar_schema()`	Schema selection from search_path ordering
`check_source_volatility()`	Volatility classification rules
`build_change_buffer_table_sql()`	DDL SQL construction from column metadata
`compute_refresh_action()`	Decision tree for FULL vs DIFFERENTIAL vs SKIP

5.9 Fuzz Testing for the Parser

Priority: Low. The SQL parser (src/dvm/parser.rs, ~13,000 lines) handles arbitrary SQL from pg_sys::raw_parser(). Fuzz testing with cargo-fuzz or proptest could uncover:

Panics on unexpected parse tree shapes
Incorrect SQL generation for exotic syntax
Memory safety issues in unsafe FFI boundaries

This requires a running PostgreSQL instance (for pg_sys::raw_parser()), so it would need to be done as an E2E fuzzing harness.

6. Test Commands Reference

# ── Unit tests (fast, no Docker) ──
just test-unit                    # 854 tests, ~5 seconds

# ── Integration tests (bare PG containers) ──
just test-integration             # 101 tests, ~3-4 minutes

# ── E2E tests (requires Docker image) ──
just build-e2e-image              # Build Docker image (~5-15 min first time)
just test-e2e                     # 340 tests, ~10-15 min (rebuilds image)
just test-e2e-fast                # Skip image rebuild

# ── All tests ──
just test-all                     # Unit + integration + E2E + pgrx

# ── Benchmarks ──
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_full_matrix

# ── Coverage ──
just coverage                     # Unit coverage (HTML + LCOV)
just coverage-text                # Unit coverage (terminal)
just coverage-e2e                 # E2E instrumented coverage (full pipeline)
just coverage-e2e-fast            # Skip Docker image rebuild

# ── Lint ──
just fmt                          # Format code
just lint                         # Format check + clippy

7. Summary

Aspect	Status	Notes
Unit tests	854 tests, ~63% coverage	Structural ceiling — 86% of remaining uncovered lines need a DB
E2E tests	340 tests	Comprehensive, exercising full SQL API (4 pre-existing failures tracked)
Integration tests	101 tests	Now includes 13 catalog compatibility canaries for PG version behavior
Property tests	11 tests	Found real correctness bugs; need expansion
Benchmarks	16 benchmarks	Refresh performance matrix, trigger overhead planned
E2E coverage infra	Built, not in CI	Projected 75-85% combined coverage
CI coverage tracking	Not started	Highest-priority improvement
Pure function extraction	In progress	`classify_relkind`, `strip_view_definition_suffix` extracted; more candidates identified
Fuzz testing	Not started	Parser is the primary target

The testing strategy is mature for an extension at this stage. The main gaps are: (1) CI integration for combined E2E coverage, (2) more property tests for recently added SQL features, (3) schema evolution and concurrent stress tests, and (4) systematic extraction of pure functions from SPI-dependent code to bring more logic under unit test coverage.

PGXN

PostgreSQL Extension Network

Contents