Contents
Semiring Evaluation
ProvSQL represents query provenance as a circuit of gates. A semiring evaluation DBLP:conf/pods/GreenKT07 maps this circuit to a value in a chosen semiring by:
- Assigning a leaf value to each input gate via a provenance mapping.
- Propagating these values through plus and times gates (and monus gates for m-semirings DBLP:journals/japll/GeertsP10) according to the semiring operations.
The built-in evaluation functions all follow the same calling convention:
sr_<name>(provenance(), 'mapping_name')
where provenance() returns the token for the current output row and mapping_name is the name of a provenance mapping table. Semirings are also extended with a monus operation allowing to represent the output of non-monotone queries.
Boolean Semiring
sr_boolean evaluates the provenance in the Boolean semiring ({true, false}, ∨, ∧, false, true), answering the question "was this result derivable at all, depending on the Boolean mapping of input provenance tokens?":
SELECT name, sr_boolean(provenance(), 'my_mapping') FROM mytable;
Boolean-Expression Semiring
sr_boolexpr evaluates the provenance in the Boolean-expression semiring, returning a human-readable propositional formula:
SELECT name, sr_boolexpr(provenance(), 'my_mapping') FROM mytable;
This is used as the basis for probability computation.
Symbolic Representation (as a Formula)
sr_formula returns a symbolic representation of the provenance as a human-readable formula using symbols ⊕ and ⊗:
SELECT name, sr_formula(provenance(), 'witness_mapping') FROM suspects;
For example, a result supported by tuples labelled a and b through a join would show as a⊗b; one that could come from either a or b would show as a⊕b.
Counting Semiring (m-semiring)
sr_counting evaluates the provenance in the counting m-semiring, counting the number of distinct supporting inputs:
SELECT name, sr_counting(provenance(), 'count_mapping') FROM suspects;
The mapping should assign integer values (typically 1) to leaf tokens.
Why-Provenance
sr_why returns the why-provenance of a result – the set of minimal witnesses (sets of input tuples) that support the result:
SELECT name, sr_why(provenance(), 'my_mapping') FROM mytable;
Security Semiring
The security semiring assigns security-level labels to tuples and propagates them through queries according to a lattice. It is implemented using provenance_evaluate with custom aggregates for the semiring plus and times operations. For example, given a type classification_level (an enum ordered from unclassified to top_secret):
-- Define the semiring operations
CREATE FUNCTION security_plus_state(state classification_level,
level classification_level)
RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
SELECT LEAST(state, level)
$$;
CREATE FUNCTION security_times_state(state classification_level,
level classification_level)
RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
SELECT GREATEST(state, level)
$$;
CREATE AGGREGATE security_plus(classification_level) (
sfunc = security_plus_state, stype = classification_level,
initcond = 'top_secret'
);
CREATE AGGREGATE security_times(classification_level) (
sfunc = security_times_state, stype = classification_level,
initcond = 'unclassified'
);
-- Evaluate the security level of a query result
SELECT create_provenance_mapping('personnel_level', 'personnel', 'classification');
SELECT city, provenance_evaluate(provenance(), 'personnel_level',
'unclassified'::classification_level,
'security_plus', 'security_times')
FROM (SELECT DISTINCT city FROM personnel) t;
Custom Semirings with provenance_evaluate
Advanced users can define custom semirings in SQL and evaluate them using provenance_evaluate. The function takes a provenance token, a mapping table, a zero element, and the names of aggregate functions implementing the semiring operations:
provenance_evaluate(
token UUID, -- provenance token to evaluate
token2value regclass, -- mapping table (token → value)
element_one, -- identity element (type determines the semiring value type)
plus_function, -- name of the ⊕ aggregate
times_function, -- name of the ⊗ aggregate
monus_function, -- name of the ⊖ function (optional, for m-semirings)
delta_function -- name of the δ function (optional, for δ-semirings)
)
The plus and times operations must be defined as PostgreSQL aggregate functions with a two-argument state transition function. The monus and delta operations are plain functions. See the security semiring example above for a complete illustration. Additional examples can be found in the test suite: test/sql/security.sql <../../../test/sql/security.sql> (security semiring) and test/sql/formula.sql <../../../test/sql/formula.sql> (symbolic representation as a formula).
For queries involving aggregation (GROUP BY), use aggregation_evaluate instead, which additionally takes the names of an aggregate finalization function and a semimodule operation:
aggregation_evaluate(
token, -- aggregate result (agg_token)
token2value, -- mapping table
agg_final_function, -- finalization function for the aggregate
agg_function, -- aggregate function for group values
semimod_function, -- semimodule scalar multiplication
element_one, -- identity element
plus_function,
times_function,
monus_function, -- optional
delta_function -- optional
)
See test/sql/aggregation.sql <../../../test/sql/aggregation.sql> for a complete example of aggregation_evaluate usage.
Note
provenance_evaluate and aggregation_evaluate are PL/pgSQL functions that traverse the provenance circuit recursively. They do not support cmp gates introduced by HAVING clauses; queries with HAVING will produce an error. The built-in compiled semirings (sr_formula, sr_counting, etc.) are implemented in C, support all gate types including HAVING, and are significantly faster. Prefer compiled semirings when available; use provenance_evaluate for semirings not covered by the built-in set.
Provenance Mappings
All semiring functions take a mapping name as their second argument. A mapping is created with create_provenance_mapping:
SELECT create_provenance_mapping('mapping_name', 'table_name', 'column_name');
The mapping table has columns token (uuid) and value. You can also populate it manually for custom scenarios:
INSERT INTO mapping_name(token, value) SELECT provenance(), my_label FROM mytable;