layout: default title: rowcompress

rowcompress AM

{: .no_toc }

Row-oriented batch-compressed storage with parallel scan, DELETE/UPDATE support, and multiple compression codecs. {: .fs-6 .fw-300 }

Table of contents {: .text-delta } 1. TOC {:toc}


How It Works

rowcompress stores rows in fixed-size batches (default 10,000 rows per batch). Each batch is serialized using the PostgreSQL heap tuple format and compressed as a single unit. Batch metadata (file offset, byte size, first row number, row count) is stored in engine.row_batch.

Table file
├── Batch 1: [header | row offsets | compressed heap tuple data]
├── Batch 2: [header | row offsets | compressed heap tuple data]
└── …

This AM suits append-heavy workloads where compression matters but column projection is not needed — event logs, audit trails, time-series with many columns always queried together.

Comparison with colcompress

Feature rowcompress colcompress
Orientation Row-oriented Column-oriented
Column projection No (reads full rows) Yes (reads only referenced columns)
Vectorized execution No Yes
Chunk-level min/max pruning No Yes
Write latency per row Lower Higher (columnar transposition)
GROUP BY / analytics Slower Much faster
Sequential append + range by insert order Good Good
Scatter reads (random FK lookup) Very slow Full scan (flat latency)
Typical storage savings 2–10× 3–15×

Parallel Scan

rowcompress implements the PostgreSQL parallel scan protocol via atomic batch claiming. Each parallel worker atomically increments a shared counter to claim the next unprocessed batch, decompresses it, and repeats. There is no coordinator or work-distribution step — workers self-schedule in a work-stealing fashion with zero contention on most paths.

-- Standard PostgreSQL parallel knobs apply
SET max_parallel_workers_per_gather = 4;

DELETE and UPDATE

rowcompress supports DELETE and UPDATE via per-batch deleted-row bitmasks stored in engine.row_batch. Deleted rows are masked at read time without rewriting the batch. UPDATE is implemented as delete-then-insert.

DELETE FROM logs WHERE logged_at < now() - interval '90 days';
UPDATE logs SET level = 'WARN' WHERE level = 'WARNING';

Deleted rows are reclaimed during VACUUM (batch rewrite).


When to Use rowcompress

rowcompress is the right choice when:

  • Writes are frequent and low-latency matters — packing rows into batches is cheaper than columnar transposition
  • Queries always select most or all columns — no benefit from column projection
  • Access pattern is sequential — append order is the natural read order (e.g., process batch 1, then batch 2, etc.)
  • Table is effectively write-once — logs, events, immutable records

Avoid rowcompress when: - Queries filter by non-sequential keys (user_id, session_id, etc.) — scatter reads are catastrophic (K8: 1min17s for 20k rows vs colcompress 113ms) - Heavy GROUP BY / aggregation — no vectorized execution - Storage efficiency matters most — colcompress with zstd compresses better


Per-Table Options

SELECT engine.alter_rowcompress_table_set(
    'logs'::regclass,
    batch_size        => 10000,   -- rows per compressed batch (default: 10000)
    compression       => 'zstd',
    compression_level => 5
);

-- Reset to defaults
SELECT engine.alter_rowcompress_table_reset('logs'::regclass, compression => true);

-- Rewrite all batches with current options (e.g. after changing compression)
SELECT engine.rowcompress_repack('logs');

-- Inspect options
SELECT * FROM engine.rowcompress_options WHERE table_name = 'logs';

-- Inspect batches
SELECT * FROM engine.rowcompress_batches WHERE table_name = 'logs' LIMIT 10;

Example: Audit Log Table

CREATE TABLE audit_log (
    id         bigserial,
    logged_at  timestamptz NOT NULL DEFAULT now(),
    user_id    bigint,
    action     text NOT NULL,
    table_name text,
    old_data   jsonb,
    new_data   jsonb,
    ip_address inet
) USING rowcompress;

-- Tune for write-heavy audit workload
SELECT engine.alter_rowcompress_table_set(
    'audit_log'::regclass,
    batch_size        => 10000,
    compression       => 'zstd',
    compression_level => 3  -- low level = fast writes
);

-- Read recent logs (sequential scan, all batches)
SELECT action, COUNT(*)
FROM audit_log
WHERE logged_at > now() - interval '7 days'
GROUP BY action
ORDER BY COUNT(*) DESC;

-- Check storage efficiency
SELECT table_name, total_units, live_rows,
       pg_size_pretty(pg_total_relation_size('audit_log')) AS disk_size
FROM engine.storage_health
WHERE table_name = 'audit_log';

Compression Options

Codec Notes
pglz Always available
lz4 Fast, good for write-heavy. Requires liblz4-dev
zstd Best ratio. Requires libzstd-dev
deflate Middle ground. Requires libdeflate-dev
none No compression