Contents
Purpose
Neo4jPg is a foreign data wrapper for Postgresql. It can be used to access data stored into Neo4j from Postgresql. Moreover, you can directly write a cypher query in your select.
How to install
Requirements
There is some Postgresql & Python requirements to use this project :
-
Postgresql >= 9.1
-
postgresql-plpython
-
Postgresql development packages
-
Python development packages
-
python 2.7 or >= python 3.3 as your default python with pip
On a debian system, you can directly type this command :
Installing multicorn
This project is based on Multicorn, so you have to install it.
Note
|
If you are using PG 9.1, you should use version 0.9.X version (git checkout v0.9.3 )
NOTE: If you are using PG 9.2, you should use version 1.0.X version (git checkout v01.0.1 ) |
When it’s done, you have to enable the extension in your PostgreSQL database.
Neo4j FDW
You can find The Neo4j foreign data wrapper here : https://github.com/sim51/neo4j-fwd
Clone the repository
Install the project
At this point, everything is done to use Neo4j in Postgresql !
Foreign Data Wrapper
Create a neo4j server
First step, you have to create a foreign server in Postgres :
Connection options are
-
url
The bolt url for Neo4j (default is bolt://localhost:7687) -
user
User for Neo4j -
password
password of the Neo4j user
Create a foreign table
Now you can create a foreign table that match a cypher query.
Important
|
Your cypher query must return a collection, annd you have to give an alias on each return variable. |
Filtering the data
quals
are pushed to the remote database when it’s possible. This include simple operators like :
-
equality, inequality (=, <>, >, <, ⇐, >=)
-
like, ilike and their negations
If you have defined your foreign table with this query MATCH (n:Movie) RETURN n.title as movie
,
this FDW will push all your WHERE clause directly to Neo4j by generating a cypher query that looks like to this : MATCH (n:Movie) WITH n.title as movie WHERE ... RETURN movie
;
In fact it replaces the RETURN
part of your query by a WITH ... WHERE ... RETURN
.
It works, but it’s not optimised …
To optimise the WHERE
clause in the generated cypher query, you can define a WHERE placeholder in the cypher definition of your foreign table
Example :
CREATE FOREIGN TABLE actedIn (
actor varchar NOT NULL,
born smallint,
movie varchar NOT NULL
) SERVER multicorn_neo4j OPTIONS (
cypher 'MATCH (p:Person) /*WHERE{"actor":"p.name", "born":"p.born"}*/ WITH p MATCH (p)-[:ACTED_IN]->(m:Movie) /*WHERE{"movie":"m.title"}*/ RETURN p.name AS actor, p.born AS born, m.title AS movie'
);
In this example you can see two where placeholders : /\*WHERE{"actor":"p.name", "born":"p.born"}*/
& /\*WHERE{"movie":"m.title"}*/
A placeholder is defined by /\*WHERE{ ... }*/
(please respect the cast, it’s a strict match).
Then inside, you have to define the cypher field name of the SQL field.
With those information, the plugin know how to put the where clause in your cypher query.
So this SQL query :
Will generate this cypher query :
Make cypher query into a sql select
This project also define a cool postgres function cypher
, that allow you to write a cypher query into a select.
Example : SELECT * FROM cypher('MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 10')
The cypher
function returns a postgres JSON type.
Create the function into your database
You have to declare those functions into your database, before to use it.
This define three functions :
-
cypher(query, params, server)
: make a cypher query on the foreign server specify (server is the name of the foreign server. Examplemulticorn_neo4j
) :SELECT * FROM cypher('MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 10', '{}', 'multicorn_neo4j')
-
cypher(query, params)
: make a cypher query on the first foreign server defined, with neo4j query parameter :SELECT * FROM cypher('MATCH (n:Movie) WHERE n.title CONTAINS $name RETURN n.title AS title LIMIT 10', '{"name":"Matrix"}');
-
cypher(query)
: make a cypher query on the first foreign server defined :SELECT * FROM cypher('MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 10')
How to use it
The JSON produced follow your cypher return statement : the key of the first json level correspond to you the name of yours returns, and the value to json serialisation fo the object.
If the return object is a Node, it’s serialize as a JSON object like this : { id:X, labels : [], properties: { object } }
Example :
{"n":{"labels": ["Location"],"properties": {"y": 1906520.0, "x": 1158953.0, "name": "025XX W AUGUSTA BLVD"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1842294.0, "x": 1175702.0, "name": "094XX S HARVARD AVE"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1931163.0, "x": 1152905.0, "name": "047XX N KIMBALL AVE"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1887355.0, "x": 1149049.0, "name": "041XX W 24TH PL"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1869892.0, "x": 1176061.0, "name": "001XX W 53RD ST"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1862782.0, "x": 1180056.0, "name": "063XX S DR MARTIN LUTHER KING JR DR"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1908312.0, "x": 1175281.0, "name": "001XX W DIVISION ST"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1899998.0, "x": 1139456.0, "name": "0000X N PINE AVE"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1908407.0, "x": 1176113.0, "name": "012XX N STATE PKWY"}}}
{"n":{"labels": ["Location"],"properties": {"y": 1888098.0, "x": 1148713.0, "name": "023XX S KEELER AVE"}}}
(10 lignes)
If the return object is a relation, it's serialize as a JSON object like this :` { type : "MY_TYPE", properties: { object } }`
Example :
[source, sql]
mydatabase=# SELECT cypher FROM cypher(MATCH (n)-[r]→(m) RETURN r AS relation LIMIT 10); cypher
{"relation":{"type": "IS_TYPE_OF","properties": {}}}
{"relation":{"type": "IS_TYPE_OF","properties": {}}}
{"relation":{"type": "IS_LOCALIZED_AT","properties": {}}}
{"relation":{"type": "HAS_ARREST","properties": {}}}
{"relation":{"type": "IS_DOMESTIC","properties": {}}}
{"relation":{"type": "IN_YEAR","properties": {}}}
{"relation":{"type": "IS_IN_CATEGORY","properties": {}}}
{"relation":{"type": "IS_TYPE_OF","properties": {}}}
{"relation":{"type": "IS_TYPE_OF","properties": {}}}
{"relation":{"type": "IS_TYPE_OF","properties": {}}}
(10 lignes)
Of course, for primitive type are also supported, and you can mix all of this : SELECT cypher FROM cypher(MATCH (y:Year)-[r]→(m) RETURN y.value AS year, r, m LIMIT 10);
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10016718"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017521"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10018383"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10087834"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017190"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017379"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017246"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017248"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017208"}}}
{"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017211"}}}
(10 lignes)
=== The power of PG & JSON
PG 9.4 have a function name `json_to_record`, that convert our json into a collection of typed tuple !
[source, sql]
mydatabase=# SELECT year, id FROM cypher(MATCH (y:Year)←[r]-(m) RETURN y.value AS year, m.id AS id LIMIT 10) , json_to_record(cypher) as x(year int, id varchar) year | id ------+---------- 2015 | 10016718 2015 | 10017521 2015 | 10018383 2015 | 10087834 2015 | 10017190 2015 | 10017379 2015 | 10017246 2015 | 10017248 2015 | 10017208 2015 | 10017211 (10 lignes)
== Run test
You need to have **docker compose** installed.
Then you just have to run the `./scripts/tests.sh` script.
== More Examples
If you want to see more examples, just take a look in folder `test/sql`
== kb
* To enable log in postgres : `SET client_min_messages = DEBUG`
* To enable query log in Neo4j : `CALL dbms.setConfigValue("dbms.logs.query.enabled", "true")`
* To open an `psql` session on the database `neo4j` with debug messages : `env PGOPTIONS='-c client_min_messages=DEBUG' psql neo4j`
* Alter an option of foreign table (replace ADD by SET or DROP): `ALTER FOREIGN TABLE actedin OPTIONS ( ADD estimated_rows '-1');`
* Display the detail of a table : `\d+ person`