Semiring Evaluation

ProvSQL represents query provenance as a circuit of gates. A semiring evaluation DBLP:conf/pods/GreenKT07 maps this circuit to a value in a chosen semiring by:

  1. Assigning a leaf value to each input gate via a provenance mapping.
  2. Propagating these values through plus and times gates (and monus gates for m-semirings DBLP:journals/japll/GeertsP10) according to the semiring operations.

The built-in evaluation functions all follow the same calling convention:

sr_<name>(provenance(), 'mapping_name')

where provenance() returns the token for the current output row and mapping_name is the name of a provenance mapping table. Semirings are also extended with a monus operation allowing to represent the output of non-monotone queries.

Boolean Semiring

sr_boolean evaluates the provenance in the Boolean semiring ({true, false}, ∨, ∧, false, true), answering the question "was this result derivable at all, depending on the Boolean mapping of input provenance tokens?":

SELECT name, sr_boolean(provenance(), 'my_mapping')
FROM mytable;

Boolean-Expression Semiring

sr_boolexpr evaluates the provenance in the Boolean-expression semiring, returning a human-readable propositional formula:

SELECT name, sr_boolexpr(provenance(), 'my_mapping')
FROM mytable;

This is used as the basis for probability computation.

Symbolic Representation (as a Formula)

sr_formula returns a symbolic representation of the provenance as a human-readable formula using symbols and :

SELECT name, sr_formula(provenance(), 'witness_mapping')
FROM suspects;

For example, a result supported by tuples labelled a and b through a join would show as a⊗b; one that could come from either a or b would show as a⊕b.

Counting Semiring (m-semiring)

sr_counting evaluates the provenance in the counting m-semiring, counting the number of distinct supporting inputs:

SELECT name, sr_counting(provenance(), 'count_mapping')
FROM suspects;

The mapping should assign integer values (typically 1) to leaf tokens.

Why-Provenance

sr_why returns the why-provenance of a result – the set of minimal witnesses (sets of input tuples) that support the result:

SELECT name, sr_why(provenance(), 'my_mapping')
FROM mytable;

Security Semiring

The security semiring assigns security-level labels to tuples and propagates them through queries according to a lattice. It is implemented using provenance_evaluate with custom aggregates for the semiring plus and times operations. For example, given a type classification_level (an enum ordered from unclassified to top_secret):

-- Define the semiring operations
CREATE FUNCTION security_plus_state(state classification_level,
                                    level classification_level)
  RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
    SELECT LEAST(state, level)
$$;

CREATE FUNCTION security_times_state(state classification_level,
                                     level classification_level)
  RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
    SELECT GREATEST(state, level)
$$;

CREATE AGGREGATE security_plus(classification_level) (
  sfunc = security_plus_state, stype = classification_level,
  initcond = 'top_secret'
);
CREATE AGGREGATE security_times(classification_level) (
  sfunc = security_times_state, stype = classification_level,
  initcond = 'unclassified'
);

-- Evaluate the security level of a query result
SELECT create_provenance_mapping('personnel_level', 'personnel', 'classification');

SELECT city, provenance_evaluate(provenance(), 'personnel_level',
                                 'unclassified'::classification_level,
                                 'security_plus', 'security_times')
FROM (SELECT DISTINCT city FROM personnel) t;

Custom Semirings with provenance_evaluate

Advanced users can define custom semirings in SQL and evaluate them using provenance_evaluate. The function takes a provenance token, a mapping table, a zero element, and the names of aggregate functions implementing the semiring operations:

provenance_evaluate(
    token UUID,           -- provenance token to evaluate
    token2value regclass, -- mapping table (token → value)
    element_one,          -- identity element (type determines the semiring value type)
    plus_function,        -- name of the ⊕ aggregate
    times_function,       -- name of the ⊗ aggregate
    monus_function,       -- name of the ⊖ function (optional, for m-semirings)
    delta_function        -- name of the δ function (optional, for δ-semirings)
)

The plus and times operations must be defined as PostgreSQL aggregate functions with a two-argument state transition function. The monus and delta operations are plain functions. See the security semiring example above for a complete illustration. Additional examples can be found in the test suite: test/sql/security.sql <../../../test/sql/security.sql> (security semiring) and test/sql/formula.sql <../../../test/sql/formula.sql> (symbolic representation as a formula).

For queries involving aggregation (GROUP BY), use aggregation_evaluate instead, which additionally takes the names of an aggregate finalization function and a semimodule operation:

aggregation_evaluate(
    token,                -- aggregate result (agg_token)
    token2value,          -- mapping table
    agg_final_function,   -- finalization function for the aggregate
    agg_function,         -- aggregate function for group values
    semimod_function,     -- semimodule scalar multiplication
    element_one,          -- identity element
    plus_function,
    times_function,
    monus_function,       -- optional
    delta_function        -- optional
)

See test/sql/aggregation.sql <../../../test/sql/aggregation.sql> for a complete example of aggregation_evaluate usage.

Note

provenance_evaluate and aggregation_evaluate are PL/pgSQL functions that traverse the provenance circuit recursively. They do not support cmp gates introduced by HAVING clauses; queries with HAVING will produce an error. The built-in compiled semirings (sr_formula, sr_counting, etc.) are implemented in C, support all gate types including HAVING, and are significantly faster. Prefer compiled semirings when available; use provenance_evaluate for semirings not covered by the built-in set.

Provenance Mappings

All semiring functions take a mapping name as their second argument. A mapping is created with create_provenance_mapping:

SELECT create_provenance_mapping('mapping_name', 'table_name', 'column_name');

The mapping table has columns token (uuid) and value. You can also populate it manually for custom scenarios:

INSERT INTO mapping_name(token, value)
SELECT provenance(), my_label FROM mytable;