PLAN_TRIGGERS_OVERHEAD.md — CDC Trigger Write-Side Overhead Benchmark

1. Motivation

The existing benchmark suite (tests/e2e_bench_tests.rs) measures refresh duration — how fast incremental refresh processes changes — but says nothing about the write-side cost the CDC trigger imposes on source tables. Every INSERT, UPDATE, or DELETE on a source table fires a PL/pgSQL AFTER trigger that:

  1. Calls pg_current_wal_lsn()
  2. Computes pg_trickle_hash(NEW."pk"::text) (or pg_trickle_hash_multi(ARRAY[...]) for composite PKs)
  3. Inserts a row into pg_trickle_changes.changes_<oid> with typed new_*/old_* columns
  4. Maintains the covering B-tree index (lsn, pk_hash, change_id) INCLUDE (action)
  5. Increments the change_id BIGSERIAL sequence

This overhead is invisible in refresh benchmarks but directly impacts the DML throughput of every monitored source table. Users need data to answer: “How much slower are my writes with a stream table watching this source?”

Current Trigger Function (generated per source)

For a table with OID 16384 and columns (id INT, amount NUMERIC) with PK on id:

CREATE OR REPLACE FUNCTION pg_trickle_changes.pg_trickle_cdc_fn_16384()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO pg_trickle_changes.changes_16384
            (lsn, action, pk_hash, "new_id", "new_amount")
        VALUES (pg_current_wal_lsn(), 'I',
                pg_trickle.pg_trickle_hash(NEW."id"::text), NEW."id", NEW."amount");
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO pg_trickle_changes.changes_16384
            (lsn, action, pk_hash, "new_id", "new_amount", "old_id", "old_amount")
        VALUES (pg_current_wal_lsn(), 'U',
                pg_trickle.pg_trickle_hash(NEW."id"::text), NEW."id", NEW."amount",
                OLD."id", OLD."amount");
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO pg_trickle_changes.changes_16384
            (lsn, action, pk_hash, "old_id", "old_amount")
        VALUES (pg_current_wal_lsn(), 'D',
                pg_trickle.pg_trickle_hash(OLD."id"::text), OLD."id", OLD."amount");
        RETURN OLD;
    END IF;
    RETURN NULL;
END;
$$;

The trigger cost is proportional to: - Column count — each column produces a new_* and/or old_* typed value in the INSERT - PK type — single-column uses pg_trickle_hash(), composite uses pg_trickle_hash_multi(ARRAY[...]) - DML operation — UPDATE writes both new_* and old_* (widest buffer row)


2. Goals

  • Quantify per-row trigger overhead in µs/row (absolute cost)
  • Report throughput ratio (ops/sec without trigger ÷ ops/sec with trigger) where > 1.0 means trigger is slower
  • Sweep across three dimensions that affect trigger cost:
    • Column count (controls typed-column INSERT width)
    • PK type (controls hash function used)
    • DML operation (INSERT/UPDATE/DELETE/mixed — UPDATE is worst-case)
  • Establish a baseline for evaluating future trigger optimizations (e.g., UNLOGGED change buffers, column pruning, logical replication migration)

3. Benchmark Design

3.1. Methodology

For each (schema, pk_type, dml_operation) combination:

  1. Baseline run — execute DML on the source table with no trigger installed (plain table, no stream table created). Measure wall-clock time for BATCH_SIZE operations across CYCLES iterations.

  2. Trigger run — create a stream table referencing the source (which installs the CDC trigger automatically via pg_trickle.create_stream_table()). TRUNCATE the change buffer between cycles to isolate per-row trigger cost from buffer-growth/bloat effects. Measure the same DML workload.

  3. Compute overhead:

    • overhead_us_per_row = (trigger_avg_us - baseline_avg_us) / BATCH_SIZE
    • throughput_ratio = baseline_ops_per_sec / trigger_ops_per_sec
    • Also capture P95 for both runs to detect variance/outliers

3.2. Parameters

Parameter Value Rationale
BATCH_SIZE 10,000 rows/cycle Large enough to amortize per-statement overhead; matches 100K/10% change rate from refresh benchmarks
CYCLES 10 Consistent with e2e_bench_tests.rs
WARMUP_CYCLES 2 Discarded to eliminate buffer cache warming effects
PRE_POPULATE 100,000 rows For UPDATE/DELETE: ensures enough rows to operate on
DML_STATEMENT Single multi-row statement per operation type Uses generate_series for INSERT, subquery-selected random rows for UPDATE/DELETE

3.3. Table Schema Fixtures

Three column widths to measure how typed-column expansion affects trigger cost:

Narrow (3 columns): sql CREATE TABLE src_narrow ( id SERIAL PRIMARY KEY, value INT NOT NULL DEFAULT 0, label TEXT NOT NULL DEFAULT 'x' ); Trigger INSERT has ~6 column references for UPDATE (3 new_* + 3 old_*). Buffer row ~80 bytes.

Medium (8 columns): sql CREATE TABLE src_medium ( id SERIAL PRIMARY KEY, a TEXT NOT NULL DEFAULT 'alpha', b NUMERIC NOT NULL DEFAULT 0.0, c INT NOT NULL DEFAULT 0, d TIMESTAMPTZ NOT NULL DEFAULT now(), e BOOLEAN NOT NULL DEFAULT false, f TEXT NOT NULL DEFAULT '', g NUMERIC NOT NULL DEFAULT 1.0 ); Trigger INSERT has ~16 column references for UPDATE (8 new_* + 8 old_*). Buffer row ~200 bytes.

Wide (20 columns): sql CREATE TABLE src_wide ( id SERIAL PRIMARY KEY, col1 INT DEFAULT 0, col2 INT DEFAULT 0, col3 INT DEFAULT 0, col4 INT DEFAULT 0, col5 INT DEFAULT 0, col6 INT DEFAULT 0, col7 INT DEFAULT 0, col8 INT DEFAULT 0, col9 INT DEFAULT 0, col10 INT DEFAULT 0, col11 INT DEFAULT 0, col12 INT DEFAULT 0, col13 INT DEFAULT 0, col14 INT DEFAULT 0, col15 INT DEFAULT 0, col16 INT DEFAULT 0, col17 INT DEFAULT 0, col18 INT DEFAULT 0, col19 INT DEFAULT 0 ); Trigger INSERT has ~40 column references for UPDATE (20 new_* + 20 old_*). Buffer row ~400 bytes. This stress-tests PL/pgSQL row-decomposition and VALUES clause construction.

3.4. PK Type Fixtures

PK Type Schema Modification Hash Function Notes
Single INT id SERIAL PRIMARY KEY pg_trickle_hash(NEW."id"::text) Baseline — cheapest hash
Composite 2-col PRIMARY KEY (id, seq) with extra seq INT NOT NULL DEFAULT 1 pg_trickle_hash_multi(ARRAY[NEW."id"::text, NEW."seq"::text]) Array construction + multi-hash
No PK Remove PRIMARY KEY constraint No pk_hash column in buffer Simpler trigger, but lsn-only index; tests the fallback path in src/cdc.rs

3.5. DML Operations

Operation Statement Pattern Trigger Codepath
INSERT-only INSERT INTO src SELECT ... FROM generate_series(1, N) TG_OP = 'INSERT': writes new_* columns only
UPDATE-only UPDATE src SET value = value + 1 WHERE id IN (SELECT id FROM src ORDER BY random() LIMIT N) TG_OP = 'UPDATE': writes both new_* and old_* columns (widest buffer row)
DELETE-only DELETE FROM src WHERE id IN (SELECT id FROM src ORDER BY random() LIMIT N) TG_OP = 'DELETE': writes old_* columns only
Mixed 70/15/15 UPDATE 70%, DELETE 15%, INSERT 15% (same split as refresh benchmarks) Representative production workload

For UPDATE and DELETE cycles, the table is pre-populated with 100K rows so there are always enough rows to operate on. After DELETE cycles, rows are re-inserted to maintain pool size.


4. Implementation

4.1. File: tests/e2e_trigger_overhead_tests.rs

New test file following the established E2E pattern:

//! CDC Trigger write-side overhead benchmarks.
//!
//! Measures the per-row cost of the AFTER trigger on source tables
//! by comparing DML throughput with and without triggers installed.
//!
//! These tests are `#[ignore]`d. Run explicitly:
//!
//! ```bash
//! cargo test --test e2e_trigger_overhead_tests -- --ignored --nocapture
//! ```
//!
//! Prerequisites: `./tests/build_e2e_image.sh`

mod e2e;
use e2e::E2eDb;
use std::time::Instant;

4.2. Configuration Constants

/// Rows per DML batch per cycle.
const BATCH_SIZE: usize = 10_000;

/// Number of measured cycles per combination.
const CYCLES: usize = 10;

/// Warm-up cycles discarded before measurement.
const WARMUP_CYCLES: usize = 2;

/// Pre-populated rows for UPDATE/DELETE workloads.
const PRE_POPULATE: usize = 100_000;

4.3. Core Helper: time_dml_batch()

/// Execute a DML workload and return per-row timing.
///
/// Returns (avg_us_per_row, p95_us_per_row, ops_per_sec, raw_cycle_times_ms).
async fn time_dml_batch(
    db: &E2eDb,
    dml_stmts: &[String],
    batch_size: usize,
    cycles: usize,
    warmup_cycles: usize,
) -> (f64, f64, f64, Vec<f64>) {
    let mut times_ms = Vec::with_capacity(cycles);

    for cycle in 0..(warmup_cycles + cycles) {
        let start = Instant::now();
        for stmt in dml_stmts {
            db.execute(stmt).await;
        }
        let elapsed_ms = start.elapsed().as_secs_f64() * 1000.0;

        if cycle >= warmup_cycles {
            times_ms.push(elapsed_ms);
        }
    }

    let avg_ms = times_ms.iter().sum::<f64>() / times_ms.len() as f64;
    let avg_us_per_row = (avg_ms * 1000.0) / batch_size as f64;
    let ops_per_sec = batch_size as f64 / (avg_ms / 1000.0);

    let mut sorted = times_ms.clone();
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
    let p95_ms = percentile(&sorted, 95.0);
    let p95_us_per_row = (p95_ms * 1000.0) / batch_size as f64;

    (avg_us_per_row, p95_us_per_row, ops_per_sec, times_ms)
}

4.4. Core Helper: time_dml_batch_with_cleanup()

Variant that TRUNCATEs the change buffer between cycles to isolate per-row trigger cost:

/// Like time_dml_batch but TRUNCATEs change buffer after each cycle.
async fn time_dml_batch_with_cleanup(
    db: &E2eDb,
    dml_stmts: &[String],
    batch_size: usize,
    cycles: usize,
    warmup_cycles: usize,
    truncate_stmt: &str,
) -> (f64, f64, f64, Vec<f64>) {
    let mut times_ms = Vec::with_capacity(cycles);

    for cycle in 0..(warmup_cycles + cycles) {
        let start = Instant::now();
        for stmt in dml_stmts {
            db.execute(stmt).await;
        }
        let elapsed_ms = start.elapsed().as_secs_f64() * 1000.0;

        // Cleanup outside measurement window
        db.execute(truncate_stmt).await;

        if cycle >= warmup_cycles {
            times_ms.push(elapsed_ms);
        }
    }

    // ... same statistics computation as time_dml_batch
}

4.5. Core Helper: bench_trigger_overhead()

/// Run baseline (no trigger) and trigger (with ST) DML, compute overhead.
async fn bench_trigger_overhead(
    table_name: &str,
    create_table_sql: &str,
    populate_sql: &str,
    dml_fn: fn(usize) -> Vec<String>,
    st_query: &str,
    batch_size: usize,
) -> TriggerOverheadResult {
    // ── Phase 1: Baseline (no trigger) ──
    let db = E2eDb::new_bench().await.with_extension().await;
    db.execute(create_table_sql).await;
    if !populate_sql.is_empty() {
        db.execute(populate_sql).await;
    }

    let (base_us, base_p95, base_ops, _) =
        time_dml_batch(&db, &dml_fn(batch_size), batch_size, CYCLES, WARMUP_CYCLES)
            .await;

    // ── Phase 2: With trigger (create ST to auto-install it) ──
    // Drop and recreate table to reset bloat/stats
    db.execute(&format!("DROP TABLE IF EXISTS {} CASCADE", table_name)).await;
    db.execute(create_table_sql).await;
    if !populate_sql.is_empty() {
        db.execute(populate_sql).await;
    }

    // Creating the ST installs the CDC trigger automatically
    db.execute(&format!(
        "SELECT pg_trickle.create_stream_table('overhead_st', $q${}$q$, '1 hour', 'INCREMENTAL')",
        st_query
    )).await;

    // Full initial refresh so the ST is populated
    db.execute(
        "SELECT pg_trickle.refresh_stream_table('overhead_st', force_full => true)"
    ).await;

    // Discover the change buffer table name
    let change_table = get_change_table_name(&db, table_name).await;

    let (trig_us, trig_p95, trig_ops, _) =
        time_dml_batch_with_cleanup(
            &db, &dml_fn(batch_size), batch_size, CYCLES, WARMUP_CYCLES,
            &format!("TRUNCATE {}", change_table),
        ).await;

    TriggerOverheadResult {
        overhead_us_per_row: trig_us - base_us,
        throughput_ratio: base_ops / trig_ops,
        baseline_us_per_row: base_us,
        trigger_us_per_row: trig_us,
        baseline_p95_us: base_p95,
        trigger_p95_us: trig_p95,
        baseline_ops_per_sec: base_ops,
        trigger_ops_per_sec: trig_ops,
    }
}

4.6. Result Struct and Reporting

struct TriggerOverheadResult {
    overhead_us_per_row: f64,
    throughput_ratio: f64,       // > 1.0 means trigger is slower
    baseline_us_per_row: f64,
    trigger_us_per_row: f64,
    baseline_p95_us: f64,
    trigger_p95_us: f64,
    baseline_ops_per_sec: f64,
    trigger_ops_per_sec: f64,
}

Output format (printed to stdout with [BENCH_TRIGGER] prefix for parseability):

╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║                    pg_trickle Trigger Overhead Results                               ║
╠════════════╤══════════╤══════════╤══════════╤══════════╤══════════╤════════╤════════════════╣
║ Schema     │ PK Type  │ DML Op   │ Base µs  │ Trig µs  │ Δ µs/row │ Ratio  │ Trig ops/s     ║
╠════════════╪══════════╪══════════╪══════════╪══════════╪══════════╪════════╪════════════════╣
║ narrow     │ single   │ INSERT   │     1.2  │     2.8  │     1.6  │  2.3x  │     357,142    ║
║ narrow     │ single   │ UPDATE   │     2.1  │     4.5  │     2.4  │  2.1x  │     222,222    ║
║ narrow     │ single   │ DELETE   │     1.8  │     3.5  │     1.7  │  1.9x  │     285,714    ║
║ narrow     │ single   │ MIXED    │     1.9  │     4.0  │     2.1  │  2.1x  │     250,000    ║
║ medium     │ single   │ UPDATE   │     3.0  │     7.2  │     4.2  │  2.4x  │     138,888    ║
║ wide       │ single   │ UPDATE   │     4.5  │    13.0  │     8.5  │  2.9x  │      76,923    ║
║ ...        │          │          │          │          │          │        │                ║
╚════════════╧══════════╧══════════╧══════════╧══════════╧══════════╧════════╧════════════════╝

(Values above are hypothetical — to be replaced with actual measurements.)

4.7. Test Functions

All tests are #[ignore] and use new_bench() for resource-constrained containers:

/// Canary test: narrow schema, single INT PK, all 4 DML types.
/// Fastest to run (~2 min). Use this to validate the harness.
#[tokio::test]
#[ignore]
async fn bench_trigger_overhead_narrow_single_pk() { ... }

/// Column count sweep: narrow × medium × wide, UPDATE-only (worst-case).
/// Answers: "How much does column count affect trigger cost?"
#[tokio::test]
#[ignore]
async fn bench_trigger_overhead_column_count_sweep() { ... }

/// PK type sweep: single × composite × no-pk, mixed DML.
/// Answers: "How much does PK hash cost matter?"
#[tokio::test]
#[ignore]
async fn bench_trigger_overhead_pk_type_sweep() { ... }

/// Full matrix: 3 schemas × 3 PK types × 4 DML ops = 36 combinations.
/// Complete dataset. Run time ~30 min.
#[tokio::test]
#[ignore]
async fn bench_trigger_overhead_full_matrix() { ... }

4.8. Justfile Target

Add to the Benchmarks section of justfile:

# Run trigger overhead benchmarks (requires E2E Docker image)
bench-trigger:
    cargo test --test e2e_trigger_overhead_tests -- --ignored --nocapture --test-threads=1

5. Sweep Matrix

5.1. Column Count Sweep (PK=single INT, DML=UPDATE)

Schema Columns new_* cols old_* cols Buffer row width (est.)
narrow 3 3 3 ~80 bytes
medium 8 8 8 ~200 bytes
wide 20 20 20 ~400 bytes

Hypothesis: Trigger cost scales roughly linearly with column count because: - PL/pgSQL must decompose NEW and OLD records into individual column references - The INSERT INTO changes_<oid> VALUES clause grows linearly - B-tree index page splits become more frequent with wider rows (higher buffer growth rate) - WAL volume per trigger fire increases proportionally

5.2. PK Type Sweep (Schema=narrow, DML=mixed)

PK Type Hash Call Expected overhead
Single INT pg_trickle_hash(NEW."id"::text) Baseline — single ::text cast + xxh64
Composite (id, seq) pg_trickle_hash_multi(ARRAY[NEW."id"::text, NEW."seq"::text]) Array construction + multi-element hash
No PK (none — pk_hash column omitted) Cheaper trigger, but index is (lsn) only

Hypothesis: Composite PK adds ~0.5–1 µs/row over single PK due to array construction. No-PK should be slightly cheaper than single PK (no hash computation), but the lsn-only index may lead to wider scan ranges during refresh (not measured here, but worth noting).

5.3. DML Operation Sweep (Schema=narrow, PK=single)

Operation Trigger columns written Expected cost
INSERT new_* only (3 cols narrow) Cheapest — smallest buffer row
UPDATE new_* + old_* (6 cols narrow) Most expensive — widest buffer row + 2 record decompositions
DELETE old_* only (3 cols narrow) Same width as INSERT
Mixed 70% U + 15% D + 15% I Weighted average

Hypothesis: UPDATE overhead is 1.5–2x INSERT overhead due to double column writes and both NEW and OLD record access.


6. Expected Results (Estimates)

Based on typical PL/pgSQL AFTER trigger overhead in PostgreSQL:

Schema PK DML Est. overhead µs/row Est. ratio
narrow single INSERT 1–3 1.5–2.5x
narrow single UPDATE 2–5 2.0–3.0x
narrow single DELETE 1–3 1.5–2.5x
narrow single MIXED 2–4 2.0–3.0x
medium single UPDATE 3–8 2.5–4.0x
wide single UPDATE 5–15 3.0–5.0x
narrow composite MIXED 2–4 2.0–3.0x
narrow no-pk MIXED 1–2 1.3–2.0x

These are rough estimates. The benchmark will provide actual numbers for this specific trigger implementation (typed columns, single covering index, BIGSERIAL sequence).

Cost Breakdown (Estimated Per-Row)

Component Est. µs Notes
PL/pgSQL function entry/exit 0.5–1.0 Fixed overhead per trigger invocation
pg_current_wal_lsn() call 0.1–0.2 Lightweight system function
pg_trickle_hash(pk::text) 0.2–0.5 Cast + xxh64 hash
INSERT INTO changes_<oid> (heap) 0.5–1.0 Scales with row width
B-tree index update 0.3–0.8 Single covering index (was 2 previously)
WAL write for buffer row 0.3–0.5 Scales with row width
BIGSERIAL sequence increment 0.1–0.3 Shared sequence lock
Total (narrow/INSERT) ~2–4
Total (wide/UPDATE) ~5–15 2x column writes + wider rows

7. Actionable Insights This Benchmark Enables

7.1. Decision: When to Use UNLOGGED Change Buffers

If the overhead is > 5 µs/row for wide tables, an UNLOGGED change buffer would eliminate WAL generation for trigger writes, potentially halving the overhead. Trade-off: change buffer data is lost on crash (acceptable if refreshes re-initialize from a full scan).

Action: If measured WAL overhead > 30% of trigger cost → implement pg_trickle.change_buffer_unlogged GUC.

7.2. Decision: When to Migrate to Logical Replication

Per AGENTS.md and ADR-001/ADR-002 in plans/adrs/PLAN_ADRS.md, triggers are recommended for < 1,000 writes/sec and logical replication for > 5,000 writes/sec. This benchmark provides actual per-row cost to compute the crossover point:

  • If trigger overhead = 3 µs/row → max sustained throughput ≈ 333K rows/sec (probably fine for most workloads)
  • If trigger overhead = 10 µs/row → max sustained throughput ≈ 100K rows/sec (start considering alternatives at 50K+ rows/sec)

Action: Update the recommendation thresholds in AGENTS.md with measured data.

7.3. Decision: Column Pruning for Change Buffers

If the column-count sweep shows strong scaling (e.g., 3x overhead at 20 cols vs 3 cols), then a future optimization could prune the change buffer to only capture columns actually referenced in the stream table’s defining query (via columns_used in pg_trickle.pgt_dependencies).

Action: If wide-table overhead > 3x narrow-table overhead → prioritize column pruning optimization.

7.4. Decision: Sequence Contention

The change_id BIGSERIAL increments a shared sequence under lock for every trigger fire. Under concurrent writers, this could become a bottleneck. While concurrent writers are out of scope for this initial benchmark, the single-writer results establish a baseline.

Action: If change_id sequence overhead is measurable → investigate replacing with ctid-based ordering or removing the column entirely.


8. Future Extensions (Not in Scope)

Extension Why Deferred
Concurrent writers (¼/8 connections) Requires pgbench-style parallel harness; adds complexity. Worth a follow-up once we see if single-writer overhead is concerning.
Batch size sweep (1/100/1000 rows per txn) Tests per-txn vs per-row amortization. Deferred for simplicity — the 10K batch already represents bulk DML.
UNLOGGED change buffer variant Requires code change to create_change_buffer_table() in src/cdc.rs. Benchmark first, optimize second.
Trigger vs. logical replication direct comparison Requires wal_level=logical and replication slot setup. Separate benchmark once/if logical replication is implemented.
Multiple stream tables per source Tests whether 2+ STs on the same source multiply trigger cost (they share a single trigger/buffer, so overhead should be constant).

9. Relationship to Existing Performance Work

This benchmark fills a gap identified in PLAN_PERFORMANCE_PART_7.md §4.6 (“Change buffer write amplification”):

Each source table DML triggers a per-row AFTER trigger that inserts into the change buffer. At high write rates, the change buffer itself becomes a write bottleneck due to: - WAL generation for every change buffer INSERT - Index maintenance for 2 indexes 1 covering index (after Session 5/AA1) - BIGSERIAL contention on the change_id sequence

Session 5 of Part 7 (already completed) reduced the index count from 2 to 1 (single covering index (lsn, pk_hash, change_id) INCLUDE (action)), which should yield ~20% trigger overhead reduction. This benchmark will measure the actual impact by providing the first concrete trigger overhead numbers.


10. Files to Create/Modify

Action File Description
Create tests/e2e_trigger_overhead_tests.rs Benchmark test file with all test functions and helpers
Modify justfile Add bench-trigger recipe in the Benchmarks section
Modify BENCHMARK.md Add “Trigger Overhead” results section after running

11. Running the Benchmark

# Prerequisites
./tests/build_e2e_image.sh

# Run just the canary test (~2 min)
cargo test --test e2e_trigger_overhead_tests bench_trigger_overhead_narrow_single_pk \
    -- --ignored --nocapture

# Run column-count sweep (~5 min)
cargo test --test e2e_trigger_overhead_tests bench_trigger_overhead_column_count_sweep \
    -- --ignored --nocapture

# Run PK-type sweep (~5 min)
cargo test --test e2e_trigger_overhead_tests bench_trigger_overhead_pk_type_sweep \
    -- --ignored --nocapture

# Run full matrix (~30 min, 36 combinations)
cargo test --test e2e_trigger_overhead_tests bench_trigger_overhead_full_matrix \
    -- --ignored --nocapture

# Or use the justfile shortcut (runs all)
just bench-trigger

12. Git Commit

After creating the implementation:

git add tests/e2e_trigger_overhead_tests.rs justfile PLAN_TRIGGERS_OVERHEAD.md
git commit -m "bench: add CDC trigger write-side overhead benchmarks

Measures per-row trigger cost across column count (3/8/20),
PK type (single/composite/none), and DML operation (I/U/D/mixed).
Reports both µs/row overhead and throughput ratio vs baseline.

New file: tests/e2e_trigger_overhead_tests.rs
New justfile target: bench-trigger"