PLAN: OFFSET Support (ORDER BY + LIMIT + OFFSET)

PLAN: OFFSET Support (ORDER BY + LIMIT + OFFSET)

Status: Complete — Core implementation done. Clippy clean. Upgrade SQL and CHANGELOG shipped in v0.2.2. Effort: ~4–8 hours (leverages existing TopK infrastructure)

Problem Statement

OFFSET is currently rejected in all defining-query contexts:

-- ❌ Rejected today
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10 OFFSET 20

-- ❌ Also rejected
SELECT * FROM orders ORDER BY created_at DESC OFFSET 10

The original rationale (documented in the FAQ and PLAN_ORDER_BY_LIMIT_OFFSET) was that OFFSET “destroys boundary stability” — every source change can shift which rows fall inside the window. However, this argument applies to delta-based incremental maintenance, not to scoped recomputation.

Since TopK already uses scoped recomputation (re-execute the full query and MERGE the result), adding OFFSET to the query has zero additional complexity in the refresh path.

Analysis

Why OFFSET was originally rejected

The FAQ states:

TopK without OFFSET has a stable boundary: only rows that cross the N-th position threshold change membership. OFFSET destroys this stability.

This is true for delta-based IVM (Option D in PLAN_ORDER_BY_LIMIT_OFFSET — the overflow-buffer approach). But TopK was implemented as Option B — scoped recomputation — which doesn’t reason about boundaries at all. It simply:

Re-executes SELECT ... ORDER BY ... LIMIT N against source tables
MERGEs the result into the stream table

The MERGE is agnostic to what the query returns. Whether it’s LIMIT 10 or LIMIT 10 OFFSET 20, the flow is identical.

Impact on future optimizations

PLAN_ORDER_BY_LIMIT_OFFSET describes two future optimizations:

Optimization	Impact of OFFSET
Option C — Boundary tracking	With `LIMIT N` alone, you track one boundary (the N-th row’s sort key). With `OFFSET M LIMIT N`, you need two boundaries (the M-th and M+N-th). This doubles the boundary-check logic but doesn’t fundamentally break it.
Option D — Overflow buffer	More complex: the overflow buffer must cover positions before the OFFSET too. Would need rethinking. Unlikely to be implemented for OFFSET.

Neither optimization is implemented or planned near-term. The scoped recomputation path handles OFFSET correctly with no additional cost.

Three OFFSET patterns to consider

Pattern	Example	Bounded?	Useful?
ORDER BY + LIMIT + OFFSET	`ORDER BY score DESC LIMIT 10 OFFSET 20`	Yes (N rows)	Yes — “page” or “window” into ranked results
ORDER BY + OFFSET (no LIMIT)	`ORDER BY score DESC OFFSET 10`	No (all rows minus M)	Marginal — large, unbounded result set
OFFSET without ORDER BY	`OFFSET 10`	No	No — non-deterministic, already rejected

Recommendation

Support ORDER BY + LIMIT + OFFSET (the “Paged TopK” pattern). Reject the other two:

ORDER BY + OFFSET without LIMIT materializes a potentially huge result set (total rows minus M). This negates the TopK benefit of bounded storage and bounded refresh cost. Reject with a clear message suggesting adding LIMIT.
OFFSET without ORDER BY remains non-deterministic and is already rejected.

Use Cases

Use Case	Query Pattern
Paginated dashboards	`ORDER BY created_at DESC LIMIT 20 OFFSET 40` — “page 3”
Excluding outliers	`ORDER BY score DESC LIMIT 50 OFFSET 5` — top 50, skipping top 5
Windowed leaderboards	`ORDER BY points DESC LIMIT 10 OFFSET 10` — “second tier”
Cascading pages	Multiple stream tables with different OFFSETs for different pages

Caveat for users

When source data changes, the “page” shifts — a row on page 3 may move to page 2 or 4. This is semantically correct (the stream table always reflects the current state of the page) but may surprise users who expect row stability. This should be documented clearly.

Implementation Plan

Step 1: Parser — accept ORDER BY + LIMIT + OFFSET

File: src/dvm/parser.rs

Current behavior in detect_topk_pattern(): rust if !select.limitOffset.is_null() { return Err(PgTrickleError::UnsupportedOperator( "OFFSET is not supported with LIMIT in defining queries. ..." )); }

Change: When ORDER BY + LIMIT + OFFSET are all present, extract the OFFSET value (must be a constant non-negative integer, same as LIMIT):

let offset_value = if !select.limitOffset.is_null() {
    match unsafe { extract_const_int_from_node(select.limitOffset) } {
        Some(v) if v >= 0 => Some(v),
        Some(_) => return Err(PgTrickleError::UnsupportedOperator(
            "OFFSET must be a non-negative integer.".into()
        )),
        None => return Err(PgTrickleError::UnsupportedOperator(
            "OFFSET in defining queries must be a constant integer.".into()
        )),
    }
} else {
    None
};

Extend TopKInfo to include the offset:

pub struct TopKInfo {
    pub limit_value: i64,
    pub offset_value: Option<i64>,  // NEW
    pub full_query: String,
    pub base_query: String,
    pub order_by_sql: String,
}

reject_limit_offset() — currently rejects OFFSET unconditionally. Change to only reject OFFSET when ORDER BY + LIMIT are not present (the non-TopK path):

// OFFSET without ORDER BY + LIMIT is not supported.
// If ORDER BY + LIMIT + OFFSET → handled by detect_topk_pattern().
if !select.limitOffset.is_null() {
    // Only reject if this is not a TopK candidate.
    // TopK detection runs first and validates OFFSET there.
    return Err(PgTrickleError::UnsupportedOperator(
        "OFFSET is not supported in defining queries without ORDER BY + LIMIT. \
         Use ORDER BY + LIMIT + OFFSET for paged TopK stream tables."
            .into(),
    ));
}

Since detect_topk_pattern() runs before reject_limit_offset() in the API flow, and TopK tables bypass the DVM pipeline (which has its own reject_limit_offset in step 7), no additional changes are needed in the rejection flow.

Step 2: Catalog — store offset metadata

File: src/catalog.rs

Add a new nullable column to pgtrickle.pgt_stream_tables:

ALTER TABLE pgtrickle.pgt_stream_tables
  ADD COLUMN topk_offset INT;  -- NULL = no OFFSET (same as OFFSET 0)

Add field to StreamTableMeta: rust pub topk_offset: Option<i32>,

Update insert(), extract_meta(), and all SELECT queries to include the new column.

Migration SQL (for pg_trickle--{current}--{next}.sql): sql ALTER TABLE pgtrickle.pgt_stream_tables ADD COLUMN topk_offset INT;

Step 3: Refresh — include OFFSET in reconstructed query

File: src/refresh.rs — execute_topk_refresh()

Current query reconstruction: rust let topk_query = format!( "{} ORDER BY {} LIMIT {}", st.defining_query, topk_order_by, topk_limit );

Change to: rust let topk_query = if let Some(offset) = st.topk_offset { format!( "{} ORDER BY {} LIMIT {} OFFSET {}", st.defining_query, topk_order_by, topk_limit, offset ) } else { format!( "{} ORDER BY {} LIMIT {}", st.defining_query, topk_order_by, topk_limit ) };

No other refresh logic changes needed — the MERGE is agnostic to the source query shape.

Step 4: API — pass offset through

File: src/api.rs

When calling StreamTableMeta::insert(), pass topk_info.offset_value:

StreamTableMeta::insert(
    // ... existing args ...
    topk_info.as_ref().map(|t| t.limit_value as i32),
    topk_info.as_ref().map(|t| t.order_by_sql.as_str()),
    topk_info.as_ref().and_then(|t| t.offset_value.map(|v| v as i32)),  // NEW
    // ...
);

Step 5: `strip_order_by_and_limit` → also strip OFFSET

File: src/dvm/parser.rs

The existing strip_order_by_and_limit() function removes ORDER BY and LIMIT from the query to produce the base_query. It must also strip OFFSET. Rename to strip_order_by_limit_offset() or extend the existing implementation.

Step 6: Tests

E2E tests (add to tests/e2e_topk_tests.rs):

Test	Scenario
`test_topk_with_offset_create`	`ORDER BY score DESC LIMIT 10 OFFSET 5` — verify 10 rows stored, starting from position 6
`test_topk_with_offset_differential`	Insert/delete rows, verify page shifts correctly after refresh
`test_topk_with_offset_zero`	`OFFSET 0` — equivalent to no offset
`test_topk_offset_without_limit_rejected`	`ORDER BY x OFFSET 10` (no LIMIT) → error
`test_topk_offset_without_order_by_rejected`	`LIMIT 10 OFFSET 5` (no ORDER BY) → error
`test_topk_offset_non_constant_rejected`	`OFFSET (SELECT ...)` → error
`test_topk_offset_negative_rejected`	`OFFSET -1` → error
`test_topk_offset_with_aggregates`	`GROUP BY dept ORDER BY SUM(x) DESC LIMIT 5 OFFSET 2`
`test_topk_offset_catalog_metadata`	Verify `topk_offset` stored correctly in catalog
`test_topk_offset_fetch_syntax`	`FETCH NEXT 10 ROWS ONLY OFFSET 5`

Unit tests in parser.rs: - detect_topk_pattern with OFFSET returns correct offset_value - strip_order_by_limit_offset removes all three clauses

Step 7: Documentation updates

File	Change
`docs/FAQ.md`	Update “Why is OFFSET not supported with TopK?” → explain it’s now supported. Add caveat about page shifting.
`docs/SQL_REFERENCE.md`	Update create_stream_table docs to mention OFFSET support with TopK.
`docs/DVM_OPERATORS.md`	Update line 12: `LIMIT / OFFSET` → clarify OFFSET is allowed with ORDER BY + LIMIT.
`README.md`	Update the SQL support table row for LIMIT/OFFSET.
`CHANGELOG.md`	Add entry for OFFSET support.

Task Checklist

[x] Parser: accept OFFSET in detect_topk_pattern() (Step 1)
[x] Parser: update reject_limit_offset() flow (Step 1)
[x] Parser: extend strip_order_by_and_limit() to strip OFFSET (Step 5)
[x] Parser: extend TopKInfo with offset_value field (Step 1)
[x] Catalog: add topk_offset column to schema DDL in lib.rs (Step 2)
[x] Catalog: update StreamTableMeta struct + CRUD (Step 2)
[x] Refresh: include OFFSET in query reconstruction (Step 3)
[x] API: pass offset through to catalog insert (Step 4)
[x] E2E tests: 8 new tests + 2 updated rejection tests (Step 6)
[x] Documentation: FAQ, SQL_REFERENCE, DVM_OPERATORS, README (Step 7)
[x] cargo fmt -- --check clean
[x] just lint (clippy) — clean as of v0.2.2
[ ] just test-e2e — requires Docker E2E image build
[x] Upgrade migration SQL (sql/pg_trickle--0.2.1--0.2.2.sql) — topk_offset pre-provisioned in 0.2.1; 0.2.2 script handles function signature updates
[ ] Unit tests: parser tests for OFFSET handling in parser.rs #[cfg(test)]
[x] CHANGELOG entry — added in v0.2.2

What Remains (Prioritized)

#	Item	Priority	Notes
1	~~clippy validation~~	P0	✅ Done — clean in v0.2.2
2	E2E test run	P0	`just build-e2e-image && just test-e2e` to validate all 9 OFFSET tests
3	~~Upgrade migration SQL~~	P1	✅ Done — `topk_offset` column pre-provisioned in 0.2.1
4	~~CHANGELOG entry~~	P1	✅ Done — added in v0.2.2
5	Unit tests	P2	Parser-level tests for `detect_topk_pattern` with OFFSET; nice-to-have since E2E covers the full path

Complexity Assessment

Component	Effort	Risk
Parser changes	1–2 hours	Low — extends existing TopK detection
Catalog schema + migration	1 hour	Low — single nullable column
Refresh path	30 min	Very low — one conditional format string
API plumbing	30 min	Low — pass-through
Tests	2–3 hours	Low — follows existing TopK test patterns
Documentation	1 hour	Low
Total	~4–8 hours	Low

The low effort is because this is purely an extension of the existing TopK infrastructure. No new operators, no new refresh strategies, no query rewriting — just one more clause in the reconstructed query.

What We’re NOT Doing

ORDER BY + OFFSET without LIMIT — rejected because the result set is unbounded. Users should add LIMIT.
Delta-based OFFSET maintenance — not needed. Scoped recomputation handles it correctly.
Boundary tracking optimization (Option C) for OFFSET — deferred. Can be added later as a pure optimization. Would need to track two sort-key boundaries instead of one.
IMMEDIATE mode + OFFSET — rejected (same as TopK without OFFSET). Scoped recomputation is incompatible with within-transaction IVM.

PGXN

PostgreSQL Extension Network

Contents

PLAN: OFFSET Support (ORDER BY + LIMIT + OFFSET)

Problem Statement

Analysis

Why OFFSET was originally rejected

Impact on future optimizations

Three OFFSET patterns to consider

Recommendation

Use Cases

Caveat for users

Implementation Plan

Step 1: Parser — accept ORDER BY + LIMIT + OFFSET

Step 2: Catalog — store offset metadata

Step 3: Refresh — include OFFSET in reconstructed query

Step 4: API — pass offset through

Step 5: `strip_order_by_and_limit` → also strip OFFSET

Step 6: Tests

Step 7: Documentation updates

Task Checklist

What Remains (Prioritized)

Complexity Assessment

What We’re NOT Doing

Further Reading

PGXN

PostgreSQL Extension Network

Contents

PLAN: OFFSET Support (ORDER BY + LIMIT + OFFSET)

Problem Statement

Analysis

Why OFFSET was originally rejected

Impact on future optimizations

Three OFFSET patterns to consider

Recommendation

Use Cases

Caveat for users

Implementation Plan

Step 1: Parser — accept ORDER BY + LIMIT + OFFSET

Step 2: Catalog — store offset metadata

Step 3: Refresh — include OFFSET in reconstructed query

Step 4: API — pass offset through

Step 5: strip_order_by_and_limit → also strip OFFSET

Step 6: Tests

Step 7: Documentation updates

Task Checklist

What Remains (Prioritized)

Complexity Assessment

What We’re NOT Doing

Further Reading

Step 5: `strip_order_by_and_limit` → also strip OFFSET