Contents
Adding Provenance to a Table
Before ProvSQL can track provenance, the extension must be loaded and provenance must be enabled on each relevant table. ProvSQL represents provenance as a circuit of gates following the model of DBLP:conf/icdt/DeutchMRT14.
Loading the Extension
In every database where you want provenance support:
CREATE EXTENSION provsql CASCADE;
The CASCADE option installs the required uuid-ossp dependency automatically.
To call ProvSQL functions without the provsql. prefix, add provsql to the search path at the start of each session:
SET search_path TO public, provsql;
To make this permanent for a specific database:
ALTER DATABASE mydb SET search_path TO public, provsql;
Most examples in this documentation omit the provsql. prefix, assuming provsql is in the search path.
Disabling Provenance Temporarily
Setting provsql.active to off makes ProvSQL silently drop all provenance annotations for the current session, as if the extension were not loaded:
SET provsql.active = off;
This is useful for running queries without provenance overhead while keeping the extension installed. See configuration for all configuration variables.
Enabling Provenance on a Table
Use add_provenance to add provenance tracking to an existing table:
SELECT provsql.add_provenance('mytable');
This adds a hidden provsql column of type uuid to the table. Each row receives a freshly generated UUID that identifies a leaf (input) gate in the provenance circuit.
After enabling provenance, every query that reads from mytable will automatically carry provenance annotations in its result set.
Note
add_provenance must be called on the base table, not on a view.
Accessing the Provenance Token
The provsql column is intentionally opaque – it is silently removed from WHERE or ORDER BY clauses. To refer to the current row's provenance token, use the provenance() function:
SELECT name, provenance() FROM mytable;
Within a query result, the provsql attribute carries a UUID value that represents the provenance circuit gate for that tuple.
Removing Provenance
To stop tracking provenance for a table (and drop the provsql column), use remove_provenance:
SELECT provsql.remove_provenance('mytable');
Provenance Mappings
A provenance mapping associates provenance tokens with values from a table column. Mappings are the bridge between abstract circuit tokens and domain-meaningful labels used by semiring evaluation functions. Use create_provenance_mapping to create one:
SELECT create_provenance_mapping('my_mapping', 'mytable', 'column_name');
The mapping is stored as an ordinary PostgreSQL table called my_mapping with two columns: token (uuid) and value (text or numeric, depending on the source column type).
Alternatively, create_provenance_mapping_view creates a view instead of a table. The view always reflects the current state of the source table, which is useful when the table is frequently updated:
SELECT create_provenance_mapping_view('my_mapping_view', 'mytable', 'column_name');
The view can be used anywhere a table-based mapping is expected (e.g., as the second argument to semiring evaluation functions).
Inspecting the Circuit
ProvSQL represents provenance as a circuit: a directed acyclic graph (DAG) of gates. By default, add_provenance associates an input gate to each existing tuple, and newly inserted tuples also receive input gates. However, tuples may carry more complex provenance -- for instance, rows created by INSERT ... SELECT or CREATE TABLE AS inherit the provenance expression of the source query.
As queries combine tuples, internal gates record the semiring operations that were applied:
- plus (⊕): alternative derivations (UNION, DISTINCT)
- times (⊗): combined use (JOIN, cross product)
- monus (⊖): difference (EXCEPT)
- delta (δ): aggregation boundary (GROUP BY)
- agg, semimod: aggregate provenance
- project, eq: where-provenance (column tracking, equijoin)
- cmp: HAVING comparisons
Two constant gates represent the semiring identity elements: gate_zero (additive identity, 𝟘) and gate_one (multiplicative identity, 𝟙).
The following functions let you navigate and inspect the circuit:
- get_gate_type -- returns the type of a gate.
- get_children -- returns the child tokens of a gate.
- identify_token -- given a provenance token, returns the source table and row it originates from.
- get_infos -- returns the integer metadata attached to a gate (e.g., aggregate function OID, comparison operator OID).
- get_extra -- returns the text metadata attached to a gate (e.g., aggregate value, column positions for where-provenance).
- get_nb_gates -- returns the total number of gates in the circuit, useful for diagnosing circuit size and performance.
SELECT provsql.get_gate_type(provenance()) FROM mytable; SELECT provsql.get_children(provenance()) FROM mytable;