Contents
PL/Proxy tutorial
This section explains how to use PL/Proxy to proxy queries across a set of remote databases. For purposes of this intro we assume each remote database has a "user" table that contains a username and an email column.
We also assume that the data is partitioned across remote databases by taking a hash of the username and assigning it to one of 2 servers. Real applications should use a partitioning scheme that is appropriate to their requirements.
For the purposes of this example assume that the partition databases part00 and part01 both contain a table resembling
CREATE TABLE users (
username text,
email text
);
Installation
Download PL/Proxy from https://plproxy.github.com and extract.
Build PL/Proxy by running
make
andmake install
inside of the plproxy directory. If your having problems make sure that pg_config from the postgresql bin directory is in your path.To install PL/Proxy in a database execute the commands in the
plproxy.sql
file. For examplepsql -f $SHAREDIR/contrib/plproxy.sql mydatabase
Steps 1 and 2 can be skipped if your installed pl/proxy from a packaging system such as RPM.
Simple remote function call
Here we will create a plproxy function that will run on the proxy database which will connect to a remote database named 'part00' and return a users email address.
This example uses plproxy in CONNECT mode, it will
connect to dbname=part00
and run following SQL there:
CREATE FUNCTION get_user_email(i_username text)
RETURNS SETOF text AS $$
CONNECT 'dbname=part00';
SELECT email FROM users WHERE username = $1;
$$ LANGUAGE plproxy;
SELECT * from get_user_email($1);
The above example uses plproxy to proxy the query to the remote database but doesn't handle partitioning of data. It assumes that the entire users table is in the remote users database. The next few steps will describe how to partition data with PL/Proxy.
Create configuration functions
Using PL/Proxy for partitioning requires setting up some configuration functions. Alternatively, if you are running PostgreSQL 8.4 or above you can take advantage of the SQL/MED connection management facilities. See below.
When a query needs to be forwarded to a remote database the function
plproxy.get_cluster_partitions(cluster)
is invoked by plproxy to get the
connection string to use for each partition.
The following is an example
CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text)
RETURNS SETOF text AS $$
BEGIN
IF cluster_name = 'usercluster' THEN
RETURN NEXT 'dbname=part00 host=127.0.0.1';
RETURN NEXT 'dbname=part01 host=127.0.0.1';
RETURN;
END IF;
RAISE EXCEPTION 'Unknown cluster';
END;
$$ LANGUAGE plpgsql;
A production application might query some configuration tables to return the
connstrings. The number of partitions must be a power of 2 unless modular_mapping
is set.
Next define a plproxy.get_cluster_version(cluster_name)
function. This is
called on each request and determines if the output from a cached
result from plproxy.get_cluster_partitions()
can be reused.
CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text)
RETURNS int4 AS $$
BEGIN
IF cluster_name = 'usercluster' THEN
RETURN 1;
END IF;
RAISE EXCEPTION 'Unknown cluster';
END;
$$ LANGUAGE plpgsql;
We also need to provide a plproxy.get_cluster_config()
function, ours will provide
a value for the connection lifetime. See the configuration section for details
on what this function can do.
CREATE OR REPLACE FUNCTION plproxy.get_cluster_config(
in cluster_name text,
out key text,
out val text)
RETURNS SETOF record AS $$
BEGIN
-- lets use same config for all clusters
key := 'connection_lifetime';
val := 30*60; -- 30m
RETURN NEXT;
RETURN;
END;
$$ LANGUAGE plpgsql;
The config section contains more information on all of these functions.
Configuring Pl/Proxy clusters with SQL/MED
First we need a foreign data wrapper. This is mostly a placeholder, but can be extended with a validator function to verify the cluster definition. See CREATE FOREIGN DATA WRAPPER documentation for additional details of how to manage the SQL/MED catalog.
CREATE FOREIGN DATA WRAPPER plproxy;
Then the actual cluster with its configuration options and partitions:
CREATE SERVER usercluster FOREIGN DATA WRAPPER plproxy
OPTIONS (connection_lifetime '1800',
p0 'dbname=part00 host=127.0.0.1',
p1 'dbname=part01 host=127.0.0.1' );
We also need a user mapping that maps local PostgreSQL users to remote partitions. It is possible to create PUBLIC mapping that applies for all users in the local system:
CREATE USER MAPPING FOR PUBLIC SERVER usercluster;
Or a private mapping that can only be used by specific users:
CREATE USER MAPPING FOR bob SERVER usercluster
OPTIONS (user 'plproxy', password 'salakala');
Finally we need to grant USAGE on the cluster to specific users:
GRANT USAGE ON SERVER usercluster TO bob;
Partitioned remote call
Here we assume that the user table is spread over several databases based
on a hash of the username. The connection string for the partitioned databases
are contained in the get_cluster_partitions()
function described above.
Below is a get_user_email()
function that is executed on the proxy server,which
will make a remote connection to the appropriate partitioned database. The
user's email address will be returned.
This function should be created in the proxy database.
CREATE OR REPLACE FUNCTION get_user_email(i_username text)
RETURNS SETOF text AS $$
CLUSTER 'usercluster';
RUN ON hashtext(i_username) ;
SELECT email FROM users WHERE username = i_username;
$$ LANGUAGE plproxy;
Inserting into the proper partition
Next we provide a simple INSERT function.
Inserting data through plproxy requires functions to be defined on the proxy databases that will perform the insert.
We define this function on both part00 and part01
CREATE OR REPLACE FUNCTION insert_user(i_username text, i_emailaddress text)
RETURNS integer AS $$
INSERT INTO users (username, email) VALUES ($1,$2);
SELECT 1;
$$ LANGUAGE SQL;
Now we define a proxy function inside the proxy database to send the INSERT's to the appropriate target.
CREATE OR REPLACE FUNCTION insert_user(i_username text, i_emailaddress text)
RETURNS integer AS $$
CLUSTER 'usercluster';
RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;
Putting it all together
Connect to the proxy database (The one we installed plproxy and the plproxy functions on).
SELECT insert_user('Sven','sven@somewhere.com');
SELECT insert_user('Marko', 'marko@somewhere.com');
SELECT insert_user('Steve','steve@somewhere.cm');
Now connect to the plproxy_1
and plproxy_2
databases. Sven and Marko should be
in plproxy_2
, and Steve should be in plproxy_1
.
When connected to the proxy user you can obtain data by doing
SELECT get_user_email('Sven');
SELECT get_user_email('Marko');
SELECT get_user_email('Steve');
Connection pooling
When used in partitioned setup, PL/Proxy somewhat wastes connections as it opens connection to each partition from each backend process. So it's good idea to use a pooler that can take queries from several connections and funnel them via smaller number of connections to actual database. We use and recommend PgBouncer for that.
More resources
Kristo Kaiv has his own take on tutorial here: https://kaiv.wordpress.com/category/plproxy