Contents
Data-Modification Tracking
Beyond tracking the provenance of query results, ProvSQL can also track the provenance of the rows themselves -- recording every INSERT, UPDATE, and DELETE so that one can later answer "which DML statements is this row's current state due to?", roll back individual operations with undo, or query the database as it looked at a past point in time. See ../user/data-modification and ../user/temporal for the user-facing description.
This chapter covers how the feature is implemented: the GUC, the trigger machinery, the new update gate type, the update_provenance table, and how undo / time-travel are wired on top of the same provenance circuits used for query results.
Data-modification tracking is PostgreSQL 14+ only because the temporal-validity machinery (see Time-Travel Queries below) relies on the tstzmultirange type, which was introduced as part of the multirange family in PostgreSQL 14. The temporal validity of a row is computed by evaluating its provenance circuit over an m-semiring whose carrier is a set of timestamp intervals (plus = union, times = intersection, monus = difference); the result of those operations is in general non-contiguous -- two disjoint unioned intervals, a difference that punches a hole in the middle of an interval, and so on -- so the carrier cannot be a plain tstzrange. tstzmultirange is exactly the right type for this semiring, and without it the whole construction does not close. All the SQL pieces live in sql/provsql.14.sql and are not loaded on older PostgreSQL versions (see build-system).
The GUC and the Big Picture
The whole subsystem is gated on the provsql.update_provenance GUC, declared in provsql.c and registered in _PG_init. When it is off (the default), the AFTER triggers installed by add_provenance short-circuit immediately and DML statements behave normally. When it is on, every committed INSERT / UPDATE / DELETE on a provenance-tracked table:
- Allocates a new update gate representing the operation itself, with a fresh UUID.
- Records the operation in the update_provenance housekeeping table (statement text, user, timestamp, validity range).
- Combines that gate with the provsql token of every affected row using either provenance_times (for inserts) or provenance_monus (for deletes). An update is modelled as a delete followed by an insert.
The end result is that each row's provsql token is a circuit whose leaves include not only the original input gates from add_provenance but also one update leaf per DML statement that ever touched the row.
The gate_update Gate Type
A new value, gate_update, is added to the gate_type enum in provsql_utils.h. It is treated exactly like gate_input everywhere it matters:
- MMappedCircuit and GenericCircuit allow probabilities to be attached to it (via set_prob), because semantically a DML operation is "another input" to the circuit.
- GenericCircuit::evaluate registers update gates as members of its inputs set, so semiring evaluators see them as leaves the same way they see input gates.
- getBooleanCircuit translates gate_update to a Boolean variable like gate_input, so probability evaluation and Shapley computation handle them transparently.
The only thing that distinguishes an update gate from an input gate is its semantic role -- "this leaf was created by DML, not by the original add_provenance" -- and the fact that the SQL housekeeping records (update_provenance) tie its UUID back to a concrete statement.
The Triggers
add_provenance (in sql/provsql.14.sql) installs three AFTER statement-level triggers on the target table:
AFTER INSERT REFERENCING NEW TABLE AS NEW_TABLE
EXECUTE PROCEDURE provsql.insert_statement_trigger();
AFTER UPDATE REFERENCING OLD TABLE AS OLD_TABLE
NEW TABLE AS NEW_TABLE
EXECUTE PROCEDURE provsql.update_statement_trigger();
AFTER DELETE REFERENCING OLD TABLE AS OLD_TABLE
EXECUTE PROCEDURE provsql.delete_statement_trigger();
Each trigger:
- Returns immediately if provsql.update_provenance is off.
- Allocates a fresh UUID and creates a corresponding update gate via create_gate.
- Reads the current statement text from pg_stat_activity and inserts a row into update_provenance.
- Sets provsql.update_provenance = off on a local basis to suppress the recursive triggering caused by the next steps.
- Walks the affected rows (OLD_TABLE / NEW_TABLE) and rewrites each row's provsql token: insert multiplies the row token by the new operation gate, delete applies provenance_monus to remove it, and update does both (a monus on the old row plus a times on the new one).
- Re-enables tracking before returning.
The temporary disabling in step 4 is essential: rewriting the provsql column of a tracked table is itself an UPDATE, which would re-trigger the same machinery and recurse forever otherwise.
Because deletes need to keep the old row around for time-travel queries (otherwise there would be nothing left to attach the monus token to), the delete trigger does not physically delete the row -- it re-inserts a copy under a fresh provsql token whose circuit ends in a provenance_monus against the delete gate.
The update_provenance Table
update_provenance is a regular PostgreSQL table created by sql/provsql.14.sql. Its schema:
CREATE TABLE provsql.update_provenance (
provsql uuid, -- the update gate's UUID
query text, -- the SQL text
query_type query_type_enum, -- INSERT/UPDATE/DELETE/UNDO
username text, -- session_user
ts timestamp DEFAULT CURRENT_TIMESTAMP,
valid_time tstzmultirange DEFAULT
tstzmultirange(tstzrange(CURRENT_TIMESTAMP, NULL))
);
The provsql column is the primary key in spirit: it links each row to the corresponding update gate in the circuit store. Once you have that token you can hand it back to undo to roll the operation back, or query it via the temporal functions to find out which rows were affected.
The valid_time column is a tstzmultirange. At the moment an update gate is created, it is set to the half-infinite interval [ts, + ∞): from the operation's timestamp onward, and not before. Crucially, valid_time is never modified after this initial insertion -- not even by undo. Trimming the validity of a superseded operation is not a property of the row in this table; it emerges from evaluating the row's provenance circuit over the union-of-intervals m-semiring described in Time-Travel Queries below.
Undo
undo takes a token (typically read out of update_provenance.provsql) and rolls the corresponding operation back, without removing anything from the history. The implementation is pure PL/pgSQL, in sql/provsql.14.sql.
The trick is that the row tokens already contain the operation's gate as a leaf; we just need to cancel that leaf inside every circuit that references it. undo:
- Allocates a new update gate u and records a brand-new UNDO row in update_provenance whose valid_time is again [now, + ∞).
- Walks every provenance-tracked table in every schema, and for each row in each table calls replace_the_circuit(row_token, op_token, u) to rewrite the row's circuit.
replace_the_circuit (also in sql/provsql.14.sql) is a recursive PL/pgSQL function that walks a circuit and rebuilds it with one substitution: every reference to op_token is replaced by provenance_monus(op_token, u). Because the circuit is a DAG, the rewrite is structural -- it preserves sharing within a single rewrite call.
The undo is itself an operation: the new update gate u becomes a leaf of every rewritten row, and undoing the undo (yes, you can do that) follows exactly the same path.
Time-Travel Queries
All temporal queries reduce to evaluating a row's provenance circuit over the union-of-intervals m-semiring, whose carrier is tstzmultirange:
- plus (⊕) is the union of intervals, with neutral element ∅;
- times (⊗) is the pointwise intersection, with neutral element the universal singleton {( − ∞, + ∞)};
- monus (⊖) is the pointwise difference (A minus B is the union, over a ∈ A and b ∈ B, of a∩b).
This is exactly the m-semiring the PW'25 paper (DBLP:conf/pw/WidiaatmajaDDS25) describes. It is implemented in sql/provsql.14.sql by three aggregate functions:
- union_tstzintervals_plus -- ⊕ (union);
- union_tstzintervals_times -- ⊗ (intersection);
- union_tstzintervals_monus -- ⊖ (difference).
The interpretation of a gate in this semiring is simple:
- an input gate -- a plain base-table leaf -- has no associated time and is mapped to the ⊗-neutral {( − ∞, + ∞)}, meaning "valid at any time";
- an update gate is mapped to its valid_time from update_provenance, i.e. [ts, + ∞).
The mapping is provided by the auto-generated provenance mapping view time_validity_view over the update_provenance.valid_time column; get_valid_time calls union_tstzintervals(provenance(), 'time_validity_view') on the target row and returns the resulting multirange.
The reason the temporal result "just works" after an undo is that the semiring's ⊖ does the trimming arithmetically: after replace_the_circuit has rewritten c to c ⊖ u, the union-of-intervals evaluation of that subtree produces [tsc, + ∞)⊖[tsu, + ∞) = [tsc, tsu) -- exactly the finite interval during which the undone operation was in effect. No record in update_provenance is ever modified.
The remaining user-facing temporal functions are thin wrappers on top of get_valid_time and ordinary tstzmultirange operators:
- timetravel -- rows of a table valid at a given timestamp (tests @> containment);
- timeslice -- rows valid anywhere in a given interval (tests && overlap);
- history -- full history of rows matching a key.
All four functions live entirely in PL/pgSQL. They contain no C code: every operation reduces to a query against the target table plus a call to provenance_evaluate with the union-of-intervals semiring.