Contents
- What Happens When You TRUNCATE a Table?
- Setup
- Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)
- Case 2: Manual Refresh (Explicit Recovery)
- Case 3: TRUNCATE Then INSERT (Common ETL Pattern)
- Case 4: TRUNCATE a Dimension Table in a JOIN
- Case 5: FULL Mode Stream Tables Are Immune
- Why PostgreSQL Doesn’t Fire Row Triggers on TRUNCATE
- Alternative: DELETE FROM Instead of TRUNCATE
- Best Practices
- How TRUNCATE Compares to Other Operations
- What About IMMEDIATE Mode?
- Summary
- Next in This Series
What Happens When You TRUNCATE a Table?
This tutorial explains what happens when a TRUNCATE statement hits a base table that is referenced by a stream table. Unlike INSERT, UPDATE, and DELETE — which are fully tracked by the CDC trigger — TRUNCATE is a special case that bypasses row-level triggers entirely. Understanding this gap is essential for operating pg_trickle correctly.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the 7-phase lifecycle. This tutorial explains why TRUNCATE breaks that lifecycle and how to recover.
Setup
Same e-commerce example used throughout the series:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 50.00),
('alice', 30.00),
('bob', 75.00),
('bob', 25.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|--------|------------
alice | 80.00 | 2
bob | 100.00 | 2
Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)
TRUNCATE orders;
All four rows are removed instantly.
What Happens at the Trigger Level: TRUNCATE Marker
Updated in v0.2.0: pg_trickle now installs a statement-level AFTER TRUNCATE trigger on tracked source tables. This trigger writes a single marker row to the change buffer with
action = 'T'.
Unlike the per-row DML triggers, the TRUNCATE trigger cannot capture individual row data (PostgreSQL’s TRUNCATE does not provide OLD records). Instead, it writes a sentinel:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┐
│ change_id │ lsn │ action │ new_* │ old_* │
├───────────┼─────────────┼────────┼──────────┼──────────┤
│ 5 │ 0/1A3F4000 │ T │ NULL │ NULL │
└───────────┴─────────────┴────────┴──────────┴──────────┘
The 'T' action marker tells the refresh engine: “a TRUNCATE happened — a full refresh is required.”
What Happens at the Scheduler: Automatic Full Refresh
On the next refresh cycle, the scheduler:
1. Checks the change buffer for rows in the LSN window
2. Finds the action = 'T' marker row
3. Falls back to a FULL refresh — regardless of the stream table’s configured refresh_mode
4. TRUNCATEs the stream table
5. Re-executes the defining query against the current base table state
6. Inserts all results
Since the orders table is now empty, the defining query returns zero rows:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
(0 rows) ← correct: orders is empty
No manual intervention required. The TRUNCATE marker ensures the stream table is automatically brought back into consistency on the next refresh cycle.
Note: In versions before v0.2.0, TRUNCATE was not captured at all — the change buffer stayed empty and the stream table became silently stale. If you’re running an older version, you still need to call
pgtrickle.refresh_stream_table()manually after a TRUNCATE.
Case 2: Manual Refresh (Explicit Recovery)
Although TRUNCATE is now automatically handled on the next refresh cycle, you can force an immediate recovery without waiting:
SELECT pgtrickle.refresh_stream_table('customer_totals');
This executes a full refresh regardless of the stream table’s configured refresh mode:
- TRUNCATE the stream table itself (clearing the stale data)
- Re-execute the defining query
- INSERT the results into the stream table
- Update the frontier so future differential refreshes start from the current LSN
This is useful when you can’t wait for the next scheduled refresh cycle and need the stream table consistent immediately.
Case 3: TRUNCATE Then INSERT (Common ETL Pattern)
A common data loading pattern is:
BEGIN;
TRUNCATE orders;
INSERT INTO orders (customer, amount) VALUES
('charlie', 100.00),
('charlie', 200.00),
('dave', 150.00);
COMMIT;
What the Change Buffer Sees
- TRUNCATE: 1 marker event (
action = 'T') — captured by the statement-level trigger - INSERT charlie 100.00: 1 event (captured)
- INSERT charlie 200.00: 1 event (captured)
- INSERT dave 150.00: 1 event (captured)
The change buffer has 4 rows — the TRUNCATE marker plus 3 INSERT events.
What the Scheduler Does
The scheduler sees the action = 'T' marker and triggers a full refresh, ignoring the individual INSERT events. The full refresh re-executes the defining query against the current state of orders, which now contains only charlie and dave:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
charlie | 300.00 | 2 ← correct
dave | 150.00 | 1 ← correct
The old data (alice, bob) is gone because the full refresh recomputed from scratch. This is correct — the TRUNCATE marker ensures consistency regardless of what other changes occurred in the same window.
Case 4: TRUNCATE a Dimension Table in a JOIN
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Now truncate the dimension table:
TRUNCATE customers CASCADE;
The CASCADE also truncates orders (due to the foreign key). Both tables have TRUNCATE triggers installed, so both write a 'T' marker to their respective change buffers.
On the next refresh cycle, the scheduler detects the TRUNCATE markers and performs a full refresh. The stream table is recomputed from the now-empty base tables:
-- After the next scheduled refresh:
SELECT * FROM order_details;
-- (0 rows) — correct
Case 5: FULL Mode Stream Tables Are Immune
If the stream table uses FULL refresh mode instead of DIFFERENTIAL:
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_full',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m',
refresh_mode => 'FULL'
);
A FULL-mode stream table doesn’t use the change buffer at all. Every refresh cycle: 1. TRUNCATEs the stream table 2. Re-executes the defining query 3. Inserts all results
So after a TRUNCATE of the base table, the next scheduled refresh automatically picks up the correct state — no manual intervention needed. The trade-off is that every refresh recomputes from scratch, which is more expensive for large result sets.
Why PostgreSQL Doesn’t Fire Row Triggers on TRUNCATE
Understanding the PostgreSQL internals helps explain why per-row capture is impossible:
| Operation | Mechanism | Row triggers fired? | Statement triggers fired? |
|---|---|---|---|
DELETE FROM t |
Scans and removes rows one by one | Yes — AFTER DELETE per row | Yes |
TRUNCATE t |
Removes all heap files and reinitializes the table storage | No — no per-row processing | Yes — AFTER TRUNCATE |
DELETE FROM t WHERE true |
Same as DELETE FROM t (full scan) |
Yes — AFTER DELETE per row | Yes |
TRUNCATE is fundamentally different from DELETE. It’s an O(1) operation that replaces the table’s storage files, while DELETE is O(N) — scanning every row and recording each removal in WAL.
pg_trickle uses a statement-level AFTER TRUNCATE trigger to detect the event and write a 'T' marker to the change buffer. This marker does not contain per-row data (PostgreSQL’s TRUNCATE trigger doesn’t provide OLD records), but it’s sufficient to signal that a full refresh is needed.
Alternative: DELETE FROM Instead of TRUNCATE
For DIFFERENTIAL mode, TRUNCATE is now handled automatically (via the 'T' marker and full refresh fallback). However, using DELETE FROM instead of TRUNCATE has its own advantages:
-- Instead of: TRUNCATE orders;
DELETE FROM orders;
This fires the row-level DELETE trigger for every row. The change buffer captures all removals, and the next differential refresh correctly decrements all reference counts through the standard algebraic delta path — avoiding the need for a full refresh fallback.
| Approach | Speed | Stream table consistent? | Refresh type |
|---|---|---|---|
TRUNCATE orders |
O(1) — instant | Yes — automatic full refresh on next cycle | FULL (fallback) |
DELETE FROM orders |
O(N) — scans all rows | Yes — per-row triggers fire | DIFFERENTIAL |
TRUNCATE + manual refresh |
O(1) + O(query) | Yes — immediately | FULL (manual) |
For tables with millions of rows, DELETE FROM can be slow and generate significant WAL. TRUNCATE is generally the better choice — the automatic full refresh fallback makes it safe to use.
Best Practices
1. TRUNCATE Is Safe to Use
As of v0.2.0, TRUNCATE on tracked source tables is automatically detected and triggers a full refresh on the next scheduler cycle. No manual intervention is required for standard operation.
2. Use Manual Refresh for Immediate Consistency
If you need the stream table to be consistent immediately (not on the next cycle), call refresh explicitly:
TRUNCATE orders;
SELECT pgtrickle.refresh_stream_table('customer_totals');
3. Consider IMMEDIATE Mode for Real-Time Needs
For stream tables that need to reflect TRUNCATE instantly (within the same transaction), use IMMEDIATE mode. The TRUNCATE trigger automatically performs a full refresh synchronously.
4. Consider FULL Mode for ETL-Heavy Tables
If a table is routinely truncated and reloaded, FULL refresh mode may be simpler than DIFFERENTIAL — it naturally handles TRUNCATE because it recomputes from scratch every cycle.
5. Use trigger_inventory() to Verify Triggers
You can verify that both the DML trigger and the TRUNCATE trigger are installed and enabled:
SELECT * FROM pgtrickle.trigger_inventory();
This shows one row per (source table, trigger type) confirming both pg_trickle_cdc_<oid> (DML) and pg_trickle_cdc_truncate_<oid> (TRUNCATE) triggers are present.
How TRUNCATE Compares to Other Operations
| Aspect | INSERT | UPDATE | DELETE | TRUNCATE |
|---|---|---|---|---|
| Row trigger fires? | Yes (per row) | Yes (per row) | Yes (per row) | No |
| Statement trigger fires? | Yes | Yes | Yes | Yes (writes 'T' marker) |
| Change buffer | 1 row per INSERT | 1 row per UPDATE | 1 row per DELETE | 1 marker row (action='T') |
| Stream table updated? | Yes (next refresh) | Yes (next refresh) | Yes (next refresh) | Yes (full refresh on next cycle) |
| Recovery | Automatic (differential) | Automatic (differential) | Automatic (differential) | Automatic (full refresh fallback) |
| FULL mode affected? | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) |
| IMMEDIATE mode? | Synchronous delta | Synchronous delta | Synchronous delta | Synchronous full refresh |
| Speed | O(1) per row | O(1) per row | O(1) per row | O(1) + O(query) for refresh |
What About IMMEDIATE Mode?
In IMMEDIATE mode, TRUNCATE is handled synchronously within the same transaction:
- The BEFORE TRUNCATE trigger acquires an advisory lock on the stream table
- The AFTER TRUNCATE trigger calls
pgt_ivm_handle_truncate(pgt_id) - This function TRUNCATEs the stream table and re-populates it by re-executing the defining query
- The stream table is immediately consistent — within the same transaction
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_live',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
refresh_mode => 'IMMEDIATE'
);
BEGIN;
TRUNCATE orders;
-- customer_totals_live is already empty here!
SELECT * FROM customer_totals_live; -- (0 rows)
COMMIT;
No waiting for a scheduler cycle, no stale data — TRUNCATE is fully handled in real-time.
Summary
As of v0.2.0, TRUNCATE is fully tracked by pg_trickle across all three refresh modes. While it cannot be captured as per-row DELETE events (PostgreSQL’s TRUNCATE doesn’t process individual rows), pg_trickle uses a statement-level trigger to detect the event and respond appropriately.
The key takeaways:
- TRUNCATE is automatically handled — a statement-level AFTER TRUNCATE trigger writes a
'T'marker to the change buffer - DIFFERENTIAL mode: automatic full refresh — the scheduler detects the marker and falls back to a full refresh on the next cycle
- IMMEDIATE mode: synchronous full refresh — the stream table is rebuilt within the same transaction
- FULL mode: naturally immune — every refresh recomputes from scratch regardless
- Manual refresh for instant consistency — call
pgtrickle.refresh_stream_table()if you can’t wait for the next cycle DELETE FROMremains an alternative — fires per-row triggers, enabling incremental delta processing instead of full refresh fallback
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven’t already)
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation