rdf_fdw

This Release
rdf_fdw 1.1.0
Date
Status
Stable
Other Releases
Abstract
RDF Triplestore Foreign Data Wrapper for PostgreSQL
Description
A PostgreSQL Foreign Data Wrapper to easily access RDF triplestores via SPARQL endpoints, including pushdown of several SQL Query clauses.
Released By
jim
License
MIT
Resources
Special Files
Tags

Extensions

rdf_fdw 1.1.0
RDF Triplestore Foreign Data Wrapper for PostgreSQL

Documentation

CHANGELOG
Release Notes

README


RDF Triplestore Foreign Data Wrapper for PostgreSQL (rdf_fdw)

The rdf_fdw is a PostgreSQL Foreign Data Wrapper to easily access RDF triplestores via SPARQL endpoints, including pushdown of several SQL Query clauses.

CI

Index

Requirements

In an Ubuntu environment you can install all dependencies with the following command:

apt install -y make gcc postgresql-server-dev-16 libxml2-dev libcurl4-openssl-dev

[!NOTE]
postgresql-server-dev-16 only installs the libraries for PostgreSQL 16. Change it if you’re using another PostgreSQL version.

Build and Install

To compile the source code you need to ensure the pg_config executable is properly set when you run make - this executable is typically in your PostgreSQL installation’s bin directory. After that, just run make in the root directory:

$ cd rdf_fdw
$ make

After compilation, just run make install to install the Foreign Data Wrapper:

$ make install

After building and installing the extension you’re ready to create the extension in a PostgreSQL database with CREATE EXTENSION:

CREATE EXTENSION rdf_fdw;

To install an specific version add the full version number in the WITH VERSION clause

CREATE EXTENSION rdf_fdw WITH VERSION '1.0';

To run the predefined regression tests run make installcheck with the user postgres:

$ make PGUSER=postgres installcheck

[!NOTE]
In order for rdf_fdw to convert the data retrieved from the RDF triplestore into a format that can be interpreted by PostgreSQL, it must first load all the retrieved data into memory. Therefore, if you expect to retrieve large volumes of data with a single query, make sure PostgreSQL has access to sufficient memory. Another option is to retrieve the data in chuncks by using rdf_fdw_clone_table or a customised script.

Update

To update the extension’s version you must first build and install the binaries and then run ALTER EXTENSION:

ALTER EXTENSION rdf_fdw UPDATE;

To update to an specific version use UPDATE TO and the full version number

ALTER EXTENSION rdf_fdw UPDATE TO '1.1';

Usage

To use the rdf_fdw you must first create a SERVER to connect to a SPARQL endpoint. After that, you have to create the FOREIGN TABLEs, which will contain the SPARQL instructions of what to retrieve from the endpoint.

CREATE SERVER

The SQL command CREATE SERVER defines a new foreign server. The user who defines the server becomes its owner. A SERVER requires an endpoint, so that rdf_fdw knows where to sent the SPARQL queries.

The following example creates a SERVER that connects to the DBpedia SPARQL Endpoint:

  CREATE SERVER dbpedia
  FOREIGN DATA WRAPPER rdf_fdw 
  OPTIONS (endpoint 'https://dbpedia.org/sparql');

Server Options

| Server Option | Type | Description | |—————|–––––––––––|––––––––––––––––––––––––––––––––––––––––––––––––––––––––––| | endpoint | required | URL address of the SPARQL Endpoint. | enable_pushdown | optional | Globally enables or disables pushdown of SQL clauses into SPARQL (default true) | format | optional | The rdf_fdw expects the result sets encoded in the SPARQL Query Results XML Format, which can be normally enforced by setting the MIME type application/sparql-results+xml in the Accept HTTP request header. However, there are some products that deviate from the standard and expect a differently value, e.g. xml, rdf-xml. In case the expected parameter differs from the official MIME type, it should be set in this parameter (default application/sparql-results+xml). | http_proxy | optional | Proxy for HTTP requests. | proxy_user | optional | User for proxy server authentication. | proxy_user_password | optional | Password for proxy server authentication. | connect_timeout | optional | Connection timeout for HTTP requests in seconds (default 300 seconds). | connect_retry | optional | Number of attempts to retry a request in case of failure (default 3 times). | request_redirect | optional | Enables URL redirect issued by the server (default false). | request_max_redirect | optional | Limit of how many times the URL redirection may occur. If that many redirections have been followed, the next redirect will cause an error. Not setting this parameter or setting it to 0 will allow an infinite number of redirects. | custom | optional | One or more parameters expected by the configured RDF triplestore. Multiple parameters separated by &, e.g. signal_void=on&signal_unconnected=on. Custom parameters are appended to the request URL. | query_param | optional | The request parameter where the SPARQL endpoint expects the query in a HTTP request. Most SPARQL endpoints expects the query to be in the parameter query - and this is the rdf_fdw default value. So, chances are you’ll never need to touch this server option (default query)

CREATE USER MAPPING

Availability: 1.1.0

CREATE USER MAPPING defines a mapping of a PostgreSQL user to an user in the target triplestore. For instance, to map the PostgreSQL user postgres to the user admin in the SERVER named graphdb:

CREATE SERVER graphdb
FOREIGN DATA WRAPPER rdf_fdw
OPTIONS (endpoint 'http://192.168.178.27:7200/repositories/myrepo');

CREATE USER MAPPING FOR postgres
SERVER graphdb OPTIONS (user 'admin', password 'secret');

| Option | Type | Description | |—|—|—| | user | required | name of the user for authentication | | password | optional | password of the user set in the option user |

The rdf_fdw will try to authenticate the given user using HTTP Basic Authentication - no other authentication method is currently supported. This feature can be ignored if the triplestore does not require user authentication.

CREATE FOREIGN TABLE

Foreign Tables from the rdf_fdw work as a proxy between PostgreSQL clients and RDF Triplestores. Each FOREIGN TABLE column must be mapped to a SPARQL variable, so that PostgreSQL knows where to display each node retrieved from the SPARQL queries. Optionally, it is possible to add an expression to the column, so that function calls can be used to retrieve or format the data.

Table Options

| Option | Type | Description | |—————|———––|––––––––––––––––––––––––––––––––––––––––––––––––––––––––––| | sparql | required | The raw SPARQL query to be executed | | log_sparql | optional | Logs the exact SPARQL query executed. This option is useful to check for any modification to the configured SPARQL query due to pushdown | | enable_pushdown | optional | Enables or disables pushdown of SQL clauses into SPARQL for a specific foreign table. This overrides the SERVER option enable_pushdown |

Column Options

| Option | Type | Description | |—————|———––|––––––––––––––––––––––––––––––––––––––––––––––––––––––––––| | variable | required |This option maps the table column to a SPARQL variable used in the table option sparql. A variable must start with either ? or $ (? or $ are not part of the variable name!), and the name must be a string with the following characters: [a-z], [A-Z],[0-9] | | expression | optional | Similar to variable, but instead of a SPARQL variable it can handle expressions, such as function calls. All expressions supported by the data source can be used in this option. | | language | optional | RDF language tag, e.g. en,de,pt,es,pl, etc. This option is necessary for the pushdown feature to properly set the literal language tag in FILTER expressions. Set it to * to make FILTER espressions ignore language tags when comparing literals. |
| literaltype | optional | Data type for typed literals , e.g. xsd:string, xsd:date, xsd:boolean. This option is necessary for the pushdown feature to properly set the literal type of expressions from SQL WHERE conditions. Set it to * to make FILTER espressions ignore data types when comparing literals. | | nodetype | optional | Type of the RDF node. Expected values are literal or iri. This option helps the query planner to optimize SPARQL FILTER expressions when the WHERE conditions are pushed down (default literal) |

The following example creates a FOREIGN TABLE connected to the server dbpedia. SELECT queries executed against this table will execute the SPARQL query set in the OPTION sparql, and its result sets are mapped to each column of the table via the column OPTION variable.

CREATE FOREIGN TABLE film (
  film_id text    OPTIONS (variable '?film',     nodetype 'iri'),
  name text       OPTIONS (variable '?name',     nodetype 'literal', literaltype 'xsd:string'),
  released date   OPTIONS (variable '?released', nodetype 'literal', literaltype 'xsd:date'),
  runtime int     OPTIONS (variable '?runtime',  nodetype 'literal', literaltype 'xsd:integer'),
  abstract text   OPTIONS (variable '?abstract', nodetype 'literal', literaltype 'xsd:string')
)
SERVER dbpedia OPTIONS (
  sparql '
    PREFIX dbr: <http://dbpedia.org/resource/>
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT DISTINCT ?film ?name ?released ?runtime ?abstract
    WHERE
    {
      ?film a dbo:Film ;
            rdfs:comment ?abstract ;
            dbp:name ?name ;
            dbp:released ?released ;
            dbp:runtime ?runtime .
      FILTER (LANG ( ?abstract ) = "en")
      FILTER (datatype(?released) = xsd:date)
      FILTER (datatype(?runtime) = xsd:integer)
     }
'); 

ALTER FOREIGN TABLE and ALTER SERVER

All options and parameters set to a FOREIGN TABLE or SERVER can be changed, dropped, and new ones can be added using the ALTER FOREIGN TABLE and ALTER SERVER commands.

Adding options

ALTER FOREIGN TABLE film OPTIONS (ADD enable_pushdown 'false',
                                  ADD log_sparql 'true');

ALTER SERVER dbpedia OPTIONS (ADD enable_pushdown 'false');

Changing previously configured options

ALTER FOREIGN TABLE film OPTIONS (SET enable_pushdown 'false');

ALTER SERVER dbpedia OPTIONS (SET enable_pushdown 'true');

Dropping options

ALTER FOREIGN TABLE film OPTIONS (DROP enable_pushdown,
                                  DROP log_sparql);

ALTER SERVER dbpedia OPTIONS (DROP enable_pushdown);

version

Synopsis

text rdf_fdw_version();

Availability: 1.0.0

Description

Shows the version of the installed rdf_fdw and its main libraries.


Usage

SELECT * FROM rdf_fdw_version();
                                                                                             rdf_fdw_version
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 rdf_fdw = 1.0.0, libxml/2.9.14 libcurl/7.88.1 OpenSSL/3.0.11 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+libidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 OpenLDAP/2.5.13
(1 row)

rdf_fdw_clone_table

Synopsis

void rdf_fdw_clone_table(
  foreign_table text,
    target_table text,
    begin_offset int,
    fetch_size int,
    max_records int,
    orderby_column text,
    sort_order text,
    create_table boolean,
    verbose boolean,
    commit_page boolean
)

Availability: 1.0.0

Description

This procedure is designed to copy data from a FOREIGN TABLE to an ordinary TABLE. It provides the possibility to retrieve the data set in batches, so that issues related to triplestore limits and client’s memory don’t bother too much. Parameters

foreign_table (required): FOREIGN TABLE from where the data has to be copied.

target_table (required): heap TABLE where the data from the FOREIGN TABLE is copied to.

begin_offset: starting point in the SPARQL query pagination. Default 0.

fetch_size: size of the page fetched from the triplestore. Default is the value set at fetch_size in either SERVER or FOREIGN TABLE. In case SERVER and FOREIGN TABLE do not set fetch_size, the default will be set to 100.

max_records: maximum number of records that should be retrieved from the FOREIGN TABLE. Default 0, which means no limit.

orderby_column: ordering column used for the pagination - just like in SQL ORDER BY. Default '', which means that the function will chose a column to use in the ORDER BY clause on its own. That is, the procedure will try to set the first column with the option nodetype set to iri. If the table has no iri typed nodetype, the first column will be chosen as ordering column. If you do not wish to have an ORDER BY clause at al, set this parameter to NULL.

sort_order: ASC or DESC to sort the data returned in ascending or descending order, respectivelly. Default ASC.

create_table: creates the table set in target_table before harvesting the FOREIGN TABLE. Default false.

verbose: prints debugging messages in the standard output. Default false.

commit_page: commits the inserted records immediatelly or only after the transaction is finished. Useful for those who want records to be discarded in case of an error - following the principle of everyhing or nothing. Default true, which means that all inserts will me committed immediatelly.


Usage Example

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE public.dbpedia_cities (
  uri text           OPTIONS (variable '?city', nodetype 'iri'),
  city_name text     OPTIONS (variable '?name', nodetype 'literal', literaltype 'xsd:string'),
  elevation numeric  OPTIONS (variable '?elevation', nodetype 'literal', literaltype 'xsd:integer')
)
SERVER dbpedia OPTIONS (
  sparql '
    PREFIX dbo:  <http://dbpedia.org/ontology/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dbr:  <http://dbpedia.org/resource/>
    SELECT *
    {?city a dbo:City ;
      foaf:name ?name ;
      dbo:federalState dbr:North_Rhine-Westphalia ;
      dbo:elevation ?elevation
    }
');

/*
 * Materilize all records from the FOREIGN TABLE 'public.dbpedia_cities' in 
 * the table 'public.t1_local'. 
 */ 
CALL rdf_fdw_clone_table(
      foreign_table => 'dbpedia_cities',
      target_table  => 't1_local',
      create_table => true);

SELECT * FROM t1_local;
                            uri                            |      city_name      | elevation 
-----------------------------------------------------------+---------------------+-----------
 http://dbpedia.org/resource/Aachen                        | Aachen              |     173.0
 http://dbpedia.org/resource/Bielefeld                     | Bielefeld           |     118.0
 http://dbpedia.org/resource/Dortmund                      | Dortmund            |      86.0
 http://dbpedia.org/resource/Düsseldorf                    | Düsseldorf          |      38.0
 http://dbpedia.org/resource/Gelsenkirchen                 | Gelsenkirchen       |      60.0
 http://dbpedia.org/resource/Hagen                         | Hagen               |     106.0
 http://dbpedia.org/resource/Hamm                          | Hamm                |      37.7
 http://dbpedia.org/resource/Herne,_North_Rhine-Westphalia | Herne               |      65.0
 http://dbpedia.org/resource/Krefeld                       | Krefeld             |      39.0
 http://dbpedia.org/resource/Mönchengladbach               | Mönchengladbach     |      70.0
 http://dbpedia.org/resource/Mülheim                       | Mülheim an der Ruhr |      26.0
 http://dbpedia.org/resource/Münster                       | Münster             |      60.0
 http://dbpedia.org/resource/Remscheid                     | Remscheid           |     365.0
(13 rows)

Pushdown

A pushdown is the ability to translate SQL queries in such a way that operations are performed in the data source rather than in PostgreSQL, e.g. sorting, formatting, filtering. This feature can significantly reduce the number of records retrieved from the data source. For instance, a SQL LIMIT not pushed down means that the target system will perform a full scan in the data source, prepare everything for data transfer, return it to PostgreSQL via network, and then PostgreSQL will only locally discard the unnecessary data, which depending on the total number of records can be extremely inefficient. The rdf_fdw tries to translate SQL into SPARQL queries, which due to their conceptual differences is not always an easy task, so it is often worth considering to keep SQL queries involving foreign tables as simple as possible - or just to stick to the features, data types and operators described in this section.

LIMIT

LIMIT clauses are pushed down only if the SQL query does not contain aggregates and when all conditions in the WHERE clause can be translated to SPARQL.

| SQL | SPARQL| | – | — | | LIMIT x| LIMIT x | FETCH FIRST x ROWS | LIMIT x | | FETCH FIRST ROW ONLY | LIMIT 1 |

Pushdown of OFFSET clauses is not supported, which means that OFFSET filters will be applied locally in the client. Take a look at rdf_fdw_clone_table if you wish to retrieve records in batches.

ORDER BY

ORDER BY can be pushed down if the data types can be translated into SPARQL.

| SQL | SPARQL| | – | — | | ORDER BY x ASC, ORDER BY x | ORDER BY ASC(x)| | ORDER BY x DESC |ORDER BY DESC(x) |

DISTINCT

DISTINCT is pushed down to the SPARQL SELECT statement just like in SQL. In case that the configured SPARQL query already contains a DISTINCT or a REDUCED modifier, the SQL DISTINCT won’t be pushed down. There is no equivalent for DISTINCT ON, so it cannot be pushed down either.

WHERE

The rdf_fdw will attempt to translate RDF literals to the data type of the mapped column, and this can be quite tricky! RDF literals can be pretty much everything, as often they have no explicit data type declarations, e.g. "wwu" and "wwu"^^xsd:string are equivalent. The contents of literals are often also not validated by the RDF triplestores, but PostgreSQL will validate them in query time. So, if a retrieved literal cannot be translated to declared column data type, the query will be interrupted. SQL WHERE conditions are translated into SPARQL FILTER expressions, as long as the involved columns data types and operators are supported.

Supported Data Types and Operators

| Data type | Operators | |————————————————————|—————————————| | text, char, varchar, name | =, <>, !=, ~~, !~~, ~~*, !~~* | | date, timestamp, timestamp with time zone | =, <>, !=, >, >=, <, <= | | smallint, int, bigint, numeric, double precision | =, <>, !=, >, >=, <, <= | | boolean | IS, IS NOT |

IN and ANY constructs

SQL IN and ANY constructs are translated into the SPARQL IN operator, which will be placed in a FILTER evaluation.

Pattern matching operators LIKE and ILIKE

Expressions using LIKE and ILIKE - or their equivalent operators ~~ and ~~* - are converted to REGEX filters in SPARQL. It is important to notice that pattern matching operations using LIKE/ILIKE only support the wildcards % and _, and therefore only these characters will be translated to their REGEX equivalents. Any other character that might be potentially used as a wildcard in REGEX, such as ^, | or $, will be escaped.

[!NOTE]
These pattern matching operators do not support nondeterministic collations. If required, apply a different collation to the expression to work around this limitation.

Pushdown Examples

Foreign table columns with the option literaltype

| PostgreSQL Type | Literal Type | SQL WHERE Condition | SPARQL (pushdown) | |——————|––––––––|—————————————————––|————————————————————————————————| | text | xsd:string | name = 'foo' | FILTER(?s = "foo"^^xsd:string) | | text | * | name <> 'foo' | FILTER(STR(?s) != "foo") | | text | * | name ILIKE '%Jon_s', name ~~* '%Jon_s' | FILTER(REGEX(?name,".*Jon.s$","i")) | | text | * | name LIKE '%foo%', name ~~ '%foo%'; | FILTER(REGEX(?name,".*foo.*")) | | int | xsd:integer | runtime > 42 | FILTER(?runtime > 42) | | int | xsd:integer | runtime > 40+2 | FILTER(?runtime > 42) | | numeric | - | val >= 42.73 | FILTER(?val >= 42.73) | | date | xsd:date | released BETWEEN '2021-04-01' AND '2021-04-30' | FILTER(?released >= "2021-04-01"^^xsd:date) FILTER(?released <= "2021-04-30"^^xsd:date) | | timestamp | xsd:dateTime | modified > '2021-04-06 14:07:00.26' | FILTER(?modified > "2021-04-06T14:07:00.260000"^^xsd:dateTime) | | timestamp | xsd:dateTime | modified < '2021-04-06 14:07:00.26' | FILTER(?modified < "2021-04-06T14:07:00.260000"^^xsd:dateTime) | | text | xsd:string | country IN ('Germany','France','Portugal') | FILTER(?country IN ("Germany"^^xsd:string, "France"^^xsd:string, "Portugal"^^xsd:string)) | | varchar | - | country NOT IN ('Germany','France','Portugal') | FILTER(?country NOT IN ("Germany", "France", "Portugal")) | | name | - | country = ANY(ARRAY['Germany','France','Portugal']) | FILTER(?country IN ("Germany", "France", "Portugal")) | | boolean | xsd:boolean | bnode IS TRUE | FILTER(?node = "true"^^xsd:boolean) | | boolean | xsd:boolean | bnode IS NOT TRUE | FILTER(?node != "true"^^xsd:boolean) | | boolean | xsd:boolean | bnode IS FALSE | FILTER(?node = "false"^^xsd:boolean) | | boolean | xsd:boolean | bnode IS NOT FALSE | FILTER(?node != "false"^^xsd:boolean) |

Foreign table columns with the option language

| PostgreSQL Type | Language Tag | SQL WHERE Condition | SPARQL (pushdown) | |——————|––––––––|—————————————————––|————————————————————————————————| | text | en | name = 'foo' | FILTER(?s = "foo"@en) | | name | de | name <> 'foo' | FILTER(?s != "foo"@de) | | varchar | en | country NOT IN ('Germany','France','Portugal') | FILTER(?country NOT IN ("Germany"@en, "France"@en, "Portugal"@en)) | | text | * | name = 'foo' | FILTER(STR(?s) = "foo") |

Foreign table columns with the option expression

| PostgreSQL Type | Expression | Literal Type | SQL WHERE Condition | SPARQL (pushdown) | |——————|———————————–|–––––––|———————|————————————————————————————————| | boolean | STRSTARTS(STR(?country),"https")| xsd:boolean|bnode IS TRUE | FILTER(STRSTARTS(STR(?country),"http") = "true"^^xsd:boolean) | | int | STRLEN(?variable) | xsd:integer|strlen > 10 | FILTER(STRLEN(?variable) > 10) | | text | UCASE(?variable) | - |uname = 'FOO' | FILTER(UCASE(?variable) = "FOO") |

Examples

These and other examples can be downloaded here

DBpedia

Create a SERVER and FOREIGN TABLE to query the DBpedia SPARQL Endpoint (Politicians):

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE politicians (
  uri text        OPTIONS (variable '?person',     nodetype 'iri'),
  name text       OPTIONS (variable '?personname', nodetype 'literal', literaltype 'xsd:string'),
  birthdate date  OPTIONS (variable '?birthdate',  nodetype 'literal', literaltype 'xsd:date'),
  party text      OPTIONS (variable '?partyname',  nodetype 'literal', literaltype 'xsd:string'),
  country text    OPTIONS (variable '?country',    nodetype 'literal', language 'en')
)
SERVER dbpedia OPTIONS (
  log_sparql 'true',
  sparql '
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT *
    WHERE {
      ?person 
          a dbo:Politician;
          dbo:birthDate ?birthdate;
          dbp:name ?personname;
          dbo:party ?party .       
        ?party 
          dbp:country ?country;
          rdfs:label ?partyname .
        FILTER NOT EXISTS {?person dbo:deathDate ?died}
        FILTER(LANG(?partyname) = "de")
      } 
');

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT was modified to retireve only the columns used in the SQL SELECT and WHERE clauses.
  • the conditions in the SQL WHERE clause were pushed down as SPARQL FILTER conditions.
  • the SQL ORDER BY clause was pushed down as SPARQL ORDER BY.
  • the FETCH FIRST ... ROWS ONLY was pushed down as SPARQL LIMIT
  • the column country has a language option, and its value is used as a language tag in the SPARQL expression: FILTER(?country IN ("Germany"@en, "France"@en))
SELECT name, birthdate, party
FROM politicians
WHERE 
  country IN ('Germany','France') AND 
  birthdate > '1995-12-31' AND
  party <> ''
ORDER BY birthdate DESC, party ASC
FETCH FIRST 5 ROWS ONLY;
NOTICE:  SPARQL query sent to 'https://dbpedia.org/sparql':

 PREFIX dbp: <http://dbpedia.org/property/>
 PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?personname ?birthdate ?partyname ?country 
{
      ?person 
          a dbo:Politician;
          dbo:birthDate ?birthdate;
          dbp:name ?personname;
          dbo:party ?party .       
        ?party 
          dbp:country ?country;
          rdfs:label ?partyname .
        FILTER NOT EXISTS {?person dbo:deathDate ?died}
        FILTER(LANG(?partyname) = "de")
       FILTER(?country IN ("Germany"@en, "France"@en))
 FILTER(?birthdate > "1995-12-31"^^xsd:date)
 FILTER(?partyname != ""^^xsd:string)
}
ORDER BY  DESC (?birthdate)  ASC (?partyname)
LIMIT 5

        name        | birthdate  |                  party                  
--------------------+------------+-----------------------------------------
 Louis Boyard       | 2000-08-26 | La France insoumise
 Klara Schedlich    | 2000-01-04 | Bündnis 90/Die Grünen
 Pierrick Berteloot | 1999-01-11 | Rassemblement National
 Niklas Wagener     | 1998-04-16 | Bündnis 90/Die Grünen
 Jakob Blankenburg  | 1997-08-05 | Sozialdemokratische Partei Deutschlands
(5 rows)

Getty Thesaurus

Create a SERVER and FOREIGN TABLE to query the Getty Thesaurus SPARQL endpoint Non-Italians Who Worked in Italy:

Find non-Italians who worked in Italy and lived during a given time range

  • Having event that took place in tgn:1000080 Italy or any of its descendants
  • Birth date between 1250 and 1780
  • Just for variety, we look for artists as descendants of facets ulan:500000003 “Corporate bodies” or ulan:500000002 “Persons, Artists”, rather than having type “artist” as we did in previous queries. In the previous query we used values{..} but we here use filter(in(..)).
  • Not having nationality aat:300111198 Italian or any of its descendants
CREATE SERVER getty
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'http://vocab.getty.edu/sparql.xml',
  format 'application/sparql-results+xml'
);

CREATE FOREIGN TABLE getty_non_italians (
  uri text   OPTIONS (variable '?x'),
  name text  OPTIONS (variable '?name'),
  bio text   OPTIONS (variable '?bio'),
  birth int  OPTIONS (variable '?birth')
)
SERVER getty OPTIONS (
  log_sparql 'true',
  sparql '
  PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
  PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
  PREFIX gvp: <http://vocab.getty.edu/ontology#>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  PREFIX schema: <http://schema.org/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

   SELECT ?x ?name ?bio ?birth {
  {SELECT DISTINCT ?x
    {?x foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:1000080-place}}
 	 ?x gvp:prefLabelGVP/xl:literalForm ?name;
	    foaf:focus/gvp:biographyPreferred [
	    schema:description ?bio;
       	gvp:estStart ?birth].

  FILTER ("1250"^^xsd:gYear <= ?birth && ?birth <= "1780"^^xsd:gYear)
  FILTER EXISTS {?x gvp:broaderExtended ?facet.
  FILTER(?facet in (ulan:500000003, ulan:500000002))}
  FILTER NOT EXISTS {?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) aat:300111198}}
  '); 

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • all conditions were applied locally (rdf_fdw currently does not support sub selects).
  • the columns of the FOREIGN TABLE have only the required option variable, as the other options are only necessary for the pushdown feature.
SELECT name, bio, birth
FROM getty_non_italians
WHERE bio ~~* '%artist%'
ORDER BY birth 
LIMIT 10;

NOTICE:  SPARQL query sent to 'http://vocab.getty.edu/sparql.xml':

  PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
  PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
  PREFIX gvp: <http://vocab.getty.edu/ontology#>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  PREFIX schema: <http://schema.org/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

   SELECT ?x ?name ?bio ?birth {
  {SELECT DISTINCT ?x
    {?x foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:1000080-place}}
 	 ?x gvp:prefLabelGVP/xl:literalForm ?name;
	    foaf:focus/gvp:biographyPreferred [
	    schema:description ?bio;
       	gvp:estStart ?birth].

  FILTER ("1250"^^xsd:gYear <= ?birth && ?birth <= "1780"^^xsd:gYear)
  FILTER EXISTS {?x gvp:broaderExtended ?facet.
  FILTER(?facet in (ulan:500000003, ulan:500000002))}
  FILTER NOT EXISTS {?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) aat:300111198}}
  

                name                 |                                  bio                                  | birth 
-------------------------------------+-----------------------------------------------------------------------+-------
 Juán de España                      | Spanish artist and goldsmith, active 1455                             |  1415
 Coecke van Aelst, Pieter, the elder | Flemish artist, architect, and author, 1502-1550                      |  1502
 Worst, Jan                          | Dutch artist, active ca. 1645-1655                                    |  1605
 Mander, Karel van, III              | Dutch portraitist and decorative artist, 1608-1670, active in Denmark |  1608
 Ulft, Jacob van der                 | Dutch artist, 1627-1689                                               |  1627
 Fiammingo, Giacomo                  | Flemish artist, fl. 1655                                              |  1635
 Marotte, Charles                    | French artist, fl. ca.1719-1743                                       |  1699
 Troll, Johann Heinrich              | Swiss artist, 1756-1824                                               |  1756
 Beys, G.                            | French artist, fl. ca.1786-1800                                       |  1766
 Vaucher, Gabriel Constant           | Swiss artist, 1768-1814                                               |  1768
(10 rows)

BBC Programmes and Music

Create a SERVER and FOREIGN TABLE to query the BBC Programmes and Music SPARQL endpoint (authors and their work)

CREATE SERVER bbc
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'https://lod.openlinksw.com/sparql',
  format 'application/sparql-results+xml'
);


CREATE FOREIGN TABLE artists (
  id text          OPTIONS (variable '?person',  nodetype 'iri'),
  name text        OPTIONS (variable '?name',    nodetype 'literal'),
  itemid text      OPTIONS (variable '?created', nodetype 'iri'),
  title text       OPTIONS (variable '?title',   nodetype 'literal'),
  description text OPTIONS (variable '?descr',   nodetype 'literal')
)
SERVER bbc OPTIONS (
  log_sparql 'true',
  sparql '
  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
  PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
  PREFIX dcterms: <http://purl.org/dc/terms/>
  PREFIX bibo:    <http://purl.org/ontology/bibo/>
  PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

  SELECT *
  {
    ?person a foaf:Person ;
      foaf:name ?name ;
      blterms:hasCreated ?created .
    ?created a bibo:Book ;
      dcterms:title ?title ;
    dcterms:description ?descr
  } 
'); 

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT clause was reduced to the columns used in the SQL query
  • DISTINCT and WHERE and clauses were pushed down.
  • an ORDER BY was automatically pushded down due the use of DISTINCT
SELECT DISTINCT title, description 
FROM artists
WHERE name = 'John Lennon';

NOTICE:  SPARQL query sent to 'https://lod.openlinksw.com/sparql':

 PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
 PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
 PREFIX dcterms: <http://purl.org/dc/terms/>
 PREFIX bibo:    <http://purl.org/ontology/bibo/>
 PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?name ?title ?descr 
{
    ?person a foaf:Person ;
      foaf:name ?name ;
      blterms:hasCreated ?created .
    ?created a bibo:Book ;
      dcterms:title ?title ;
    dcterms:description ?descr
   FILTER(?name = "John Lennon")
}
ORDER BY  ASC (?title)  ASC (?descr)

                            title                             |                                                                      description                                
                                      
--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------
--------------------------------------
 Sometime in New York City                                    | Limited ed. of 3500 copies.
 All you need is love                                         | Originally published: 2019.
 The John Lennon letters                                      | Originally published: London: Weidenfeld &amp; Nicolson, 2012.
 Imagine John Yoko                                            | In slip case housed in box (37 x 29 x 8 cm).
 Imagine                                                      | Originally published: 2017.
 More Beatles hits arranged for ukulele                       | Words and music by John Lennon and Paul McCartney except Something, words and music by George Harrison.
 The Lennon play : In his own write the Lennon play           | 'Adapted from John Lennon's best-selling books "In his own write" and "A Spaniard in the works".' - Book jacket.
 More Beatles hits arranged for ukulele                       | Publishers no.: NO91245.
 The John Lennon letters                                      | Originally published: 2012.
 Imagine John Yoko                                            | Includes numbered officially stamped giclée print in clothbound portfolio case.
 Last interview : all we are saying, John Lennon and Yoko Ono | Previous ed.: published as The Playboy interviews with John Lennon and Yoko Ono. New York: Playboy Press, 1981; 
Sevenoaks: New English Library, 1982.
 John Lennon : drawings, performances, films                  | Published in conjunction with the exhibition "The art of John Lennon: drawings, performances, films", Kunsthalle
 Bremen, 21 May to 13 August 1995.
 John Lennon in his own write                                 | Originally published in Great Britain in1964 by Johnathan Cape.
 Sometime in New York City                                    | In box.
 Imagine John Yoko                                            | "This edition is limited to 2,000 copies worldwide, numbered 1-2,000, plus 10 copies retained by the artist, ins
cribed i-x"--Container.
 Last interview : all we are saying, John Lennon and Yoko Ono | This ed. originally published: London: Sidgwick &amp; Jackson, 2000.
 Imagine John Yoko                                            | Includes index.
 Sometime in New York City                                    | Includes index.
 A Spaniard in the works                                      | Originally published: Great Britain : Johnathan Cape, 1965.
 All you need is love                                         | Board book.
 The Playboy interviews with John Lennon and Yoko Ono         | Originally published: New York : Playboy Press, c1981.
 Skywriting by word of mouth                                  | Originally published: New York: HarperCollins; London: Jonathan Cape, 1986.
 John Lennon in his own write ; and a Spaniard in the works   | Originally published: 1964.
 John Lennon in his own write ; and a Spaniard in the works   | Originally published: 1965.
 Lennon on Lennon : conversations with John Lennon            | "This edition published by arrangement with Chicago Review Press"--Title page verso.
(25 rows)

Wikidata

Create a SERVER and FOREIGN TABLE to query the Wikidata SPARQL endpoint (Places that are below 10 meters above sea level and their geo coordinates)

CREATE SERVER wikidata
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'https://query.wikidata.org/sparql');

CREATE FOREIGN TABLE places_below_sea_level (
  wikidata_id text         OPTIONS (variable '?place'),
  label text               OPTIONS (variable '?label'),
  wkt geometry(point,4326) OPTIONS (variable '?location'),
  elevation numeric        OPTIONS (variable '?elev')
)
SERVER wikidata OPTIONS (
  log_sparql 'true',
  sparql '
  SELECT *
  WHERE
  {
    ?place rdfs:label ?label .
    ?place p:P2044/psv:P2044 ?placeElev.
    ?placeElev wikibase:quantityAmount ?elev.
    ?placeElev wikibase:quantityUnit ?unit.
    bind(0.01 as ?km).
    FILTER( (?elev < ?km*1000 && ?unit = wd:Q11573)
        || (?elev < ?km*3281 && ?unit = wd:Q3710)
        || (?elev < ?km      && ?unit = wd:Q828224) ).
    ?place wdt:P625 ?location.    
    FILTER(LANG(?label)="en")
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
');   

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT clause was reduced to the columns used in the SQL query
  • the FETCH FIRST 10 ROWS ONLY was pushded down in a SPARQL LIMIT
SELECT wikidata_id, label, wkt
FROM places_below_sea_level
FETCH FIRST 10 ROWS ONLY;

INFO:  SPARQL query sent to 'https://query.wikidata.org/sparql':

SELECT ?place ?label ?location 
{
    ?place rdfs:label ?label .
    ?place p:P2044/psv:P2044 ?placeElev.
    ?placeElev wikibase:quantityAmount ?elev.
    ?placeElev wikibase:quantityUnit ?unit.
    bind(0.01 as ?km).
    FILTER( (?elev < ?km*1000 && ?unit = wd:Q11573)
        || (?elev < ?km*3281 && ?unit = wd:Q3710)
        || (?elev < ?km      && ?unit = wd:Q828224) ).
    ?place wdt:P625 ?location.    
    FILTER(LANG(?label)="en")
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
LIMIT 10

               wikidata_id                |      label      |                    wkt                     
------------------------------------------+-----------------+--------------------------------------------
 http://www.wikidata.org/entity/Q61308849 | Tuktoyaktuk A   | 0101000000295C8FC2F5A060C0EC51B81E855B5140
 http://www.wikidata.org/entity/Q403083   | Ahyi            | 010100000041E3101111216240A9CDA7AAAA6A3440
 http://www.wikidata.org/entity/Q31796625 | Ad Duyūk        | 01010000003AE97DE36BB7414065A54929E8DE3F40
 http://www.wikidata.org/entity/Q54888910 | Lydd Library    | 010100000074B7EBA52902ED3F0C3B8C497F794940
 http://www.wikidata.org/entity/Q27745421 | Écluse de Malon | 0101000000E8F9D346757AFDBFB9FB1C1F2DE64740
 http://www.wikidata.org/entity/Q14204611 | Bilad el-Rum    | 0101000000D578E9263168394021C059B2793A3D40
 http://www.wikidata.org/entity/Q2888647  | Petza'el        | 0101000000F886DEBC9AB841408D05D940A7054040
 http://www.wikidata.org/entity/Q2888816  | Gilgal          | 0101000000272E0948E2B84140539C1F56EAFF3F40
 http://www.wikidata.org/entity/Q4518111  | Chupícuaro      | 0101000000F10DBD79356559C05C30283B4CAD3340
 http://www.wikidata.org/entity/Q2889475  | Na'aran         | 010100000069A1D4C627BA41409EE61CF084F73F40
(10 rows)

Import data into QGIS

Create a SERVER and a FOREIGN TABLE to query the DBpedia SPARQL Geographic Information Systems

The rdf_fdw can also be used as a bridge between GIS (Geographic Information Systems) and RDF Triplestores. This example shows how to retrieve geographic coordinates of all German public universities from DBpedia, create a WKT (Well Known Text) representation of them, and finally import this data into QGIS to plot a map.

[!NOTE]
This example requires the extension PostGIS.

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE german_public_universities (
  id text                   OPTIONS (variable '?uri', nodetype 'iri'),
  name text                 OPTIONS (variable '?name',nodetype 'literal'),
  lon numeric               OPTIONS (variable '?lon', nodetype 'literal'),
  lat numeric               OPTIONS (variable '?lat', nodetype 'literal'),
  geom geometry(point,4326) OPTIONS (variable '?wkt', nodetype 'literal',
                                    expression 'CONCAT("POINT(",?lon," ",?lat,")") AS ?wkt')
) SERVER dbpedia OPTIONS (
  log_sparql 'true',
  sparql '
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbr:  <http://dbpedia.org/resource/>
    SELECT ?uri ?name ?lon ?lat
    WHERE {
      ?uri dbo:type dbr:Public_university ;
        dbp:name ?name;
        geo:lat ?lat; 
        geo:long ?lon; 
        dbp:country dbr:Germany
      }
  ');

Now that we have our FOREIGN TABLE in place, we just need to create a New PostGIS Connection in QGIS and go to Database > DB Manager …, select the table we just created and query the data using SQL:

SELECT id, name, geom
FROM german_public_universities

unis

Finally give the layer a name, select the geometry column and press Load.

unis

Deploy with Docker

To deploy rdf_fdw with docker just pick one of the supported PostgreSQL versions, install the requirements and compile the source code. For instance, a rdf_fdw Dockerfile for PostgreSQL 15 should look like this (minimal example):

FROM postgres:15

RUN apt-get update && \
    apt-get install -y make gcc postgresql-server-dev-15 libxml2-dev libcurl4-openssl-dev

RUN mkdir /extensions
COPY ./rdf_fdw-1.0.0.tar.gz /extensions/
WORKDIR /extensions

RUN tar xvzf rdf_fdw-1.0.0.tar.gz && \
    cd rdf_fdw-1.0.0 && \
    make -j && \
    make install

To build the image save it in a Dockerfile and run the following command in the root directory - this will create an image called rdf_fdw_image.:

 $ docker build -t rdf_fdw_image .

After successfully building the image you’re ready to run or create the container ..

$ docker run --name my_container -e POSTGRES_HOST_AUTH_METHOD=trust rdf_fdw_image

.. and then finally you’re able to create and use the extension!

$ docker exec -u postgres my_container psql -d mydatabase -c "CREATE EXTENSION rdf_fdw;"

For testers and developers

If you’re cool enough, try out the latest commits:

Dockerfile

FROM postgres:15

RUN apt-get update && \
    apt-get install -y git make gcc postgresql-server-dev-15 libxml2-dev libcurl4-openssl-dev

WORKDIR /

RUN git clone https://github.com/jimjonesbr/rdf_fdw.git && \
    cd rdf_fdw && \
    make -j && \
    make install

Deployment

 $ docker build -t rdf_fdw_image .
 $ docker run --name my_container -e POSTGRES_HOST_AUTH_METHOD=trust rdf_fdw_image
 $ docker exec -u postgres my_container psql -d mydatabase -c "CREATE EXTENSION rdf_fdw;"