narugn 0.2.0

This Release
narugn 0.2.0
Date
Status
Unstable
Latest Unstable
narugn 0.3.0 —
Other Releases
Abstract
A lightweight distributed computer
Description
Narugn is a lightweight distributed computer, composed by one or more locally connected cells.
Released By
gciolli
License
GPL 3
Special Files

Extensions

narugn 0.2.0
A lightweight distributed computer

Documentation

QUICKSTART
Quickstart

README

Narugn

Narugn is a lightweight distributed computer, composed by one or more cells that are connected locally. It requires PostgreSQL with the PL/Proxy extension.

Overview

The Narugn cluster is composed by cells. Each cell is a PostgreSQL database. Cells belong to one or more PostgreSQL database servers.

The Narugn cluster is distributed, in the sense that there is no hierarchy between cells: the user interacts via any cell.

The Narugn server is locally connected: each server talks only with neighbouring servers. Therefore a Narugn cluster can have a large number of cells, connected by a large number of small networks.

See "Cluster, Server, Cell" below for more details on how cells are arranged.

Usage

The Narugn cluster is accessed via a standard PostgreSQL connection, so it can be used from the command line, as well as by an application.

Extension, Logic and State

There are three categories of database objects in a Narugn cell.

"Extension" denotes objects that are created with the Narugn extension; they constitute a small core which is not supposed to change frequently.

Extension code includes primitives to implement connectivity inside the Narugn cluster, as well as the ability to load additional database objects and distribute them among all the cells in the cluster. We call such additional objects "Logic".

Logic can be updated frequently; new Logic entirely replaces any existing Logic. We provide a simple Logic management script which manages dependencies between different Logic fragments and combines them in a single self-contained unit, effectively implementing a basic package system.

Some tables or sequences which are part of Logic can be marked as "State", meaning that their contents will be preserved when replacing old Logic with a new one.

Three kinds of functions

There are three groups of functions in Narugn:

  • Internal functions

    Not for user interaction. All part of Extension.

  • API functions

    For user interaction with the Narugn cluster. All part of Extension.

  • Cell functions

    To be run on cells via API functions. Extension contains five cell functions; additional ones can be added as Logic. Cell functions are prefixed with "cell_" and have all the same input and output parameters.

API functions

  • execute_sync, execute_sync_abs

    Two ways to start a distributed computation (very similar, they only differ on the output format).

  • configure_cell (2 variants)

    Completes the configuration of a newly created Narugn cell. The first variant creates a cell from scratch; the second variant copies it configuration from an existing cell on the same server.

  • api_connect

    Connects to a cell in a neighbouring Narugn server, attempting to merge the two Narugn clusters.

  • state_table, state_sequence

    Mark tables and sequences in Logic as carrying "state", which is preserved on upgrades.

Cell functions

The following five "core" cell functions are implemented by the Narugn extension. Additional cell functions can be added as Logic.

  • cell_ping

    This is the simplest possible distributed computation; each cell just returns 'OK'.

  • cell_version

    Each cell returns version strings for PostgreSQL, the local OS, and the local Narugn extension.

  • cell_rescan

    Refreshes the notion of neighbours of each reachable cell, and does so for each discoverable cell.

  • cell_logic

    Replaces the existing logic with the one given as an argument.

  • cell_new_server

    Used internally by the api_connect API function.

Distributed execution details

The function

execute_sync
( cell_function IN text
, payload VARIADIC text[] DEFAULT '{}'
, c OUT cds
, z OUT bigint
, dt OUT interval
, output OUT text
) RETURNS SETOF RECORD

crawls the currently known cells, executes the given cell function on each cell, and then return the results. An optional variadic payload can be specified, and will be transmitted to the cell function. An absolute version "execute_sync_abs" is available, where dt is replaced by a timestamp "t".

The function named in "cell_function" must be a Narugn cell function; the "cell_" prefix is automatically added to its name. For example, by specifying cell_function := "ping" the following function will be launched on every cell:

CREATE OR REPLACE FUNCTION cell_ping
( payload IN text[]
, walked IN cdt[]
, z OUT bigint
, t OUT timestamp with time zone
, output OUT text
) RETURNS SETOF RECORD

The second input parameter "walked" is the path that has been walked from the starting cell to reach that cell. It is provided to the cell function, should it be needed. The "cdt" data type contains a pair of cell coordinates and a timestamp with time zone.

The output of execute_sync is a set of rows, obtained as the union of all the sets of rows produced by each cell. The first column denotes which cell that row comes from, while the other columns are passed directly as produced by the cell function:

  • c : cds = coordinates of the cell that produced this row
  • z : integer = order of this row among rows produced by this cell
  • dt : interval = when the row was produced, relative to the timestamp when the command was issued on the originating cell
  • output : text = contents of the row

Example:

narugn_cell_2_2=# select * from execute_sync('version'); 
   c   | z |       dt        |                                         output                                         
-------+---+-----------------+----------------------------------------------------------------------------------------
 (2,2) | 1 | 00:00:00.177874 | PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 32-bit
 (2,2) | 2 | 00:00:00.177976 | Narugn 0.2.0
 (3,2) | 1 | 00:00:00.27331  | PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 32-bit
 (3,2) | 2 | 00:00:00.273416 | Narugn 0.2.0
(4 rows)

Security

At present there are no privileges or user profiles. This is acceptable since Narugn is still a prototype, suitable for running experiments. The distributed nature of the Narugn cluster makes privileges complicated and requires a separate analysis in order to introduce them without increasing complexity too much.

Cluster, Server, Cell

A Narugn cell is a PostgreSQL database satisfying certain conditions. A cell has global coordinates, a pair of integers; cells are placed on a square grid.

There is an adjacency notion between cells, which depends on coordinates: cell (x,y) is adjacent to cell (x',y') exactly when |x-x'| + |y-y'| = 1. For instance, each cell can have up to four neighbours.

Each cell belongs to a Narugn server. A Narugn server is a PostgreSQL database server, hosting zero or more Narugn cells. For each server S there is a polygon p(S). If a server hosts a cell (x,y), then (x,y) is contained inside p(S). Each server belongs to a cluster. Two servers S1, S2 are said to be adjacent if there are cells (x1,y1) in p(S1) and (x2,y2) in p(S2) such that (x1,y1) is adjacent to (x2,y2).

A Narugn cluster a collection C={S_1,...,S_k} of one or more Narugn servers, satisfying two conditions:

  1. (connected) for each i there is j such that S_i is adjacent to S_j;

  2. (disjoint) for each distinct i and j, S_i does not overlap with S_j.

It is possible to merge two existing clusters, provided that the servers do not overlap; this is implemented via the "connect" API call.

See also

doc/QUICKSTART.md for a guided tour.

Author

Gianni Ciolli, 2ndQuadrant Italia gianni.ciolli@2ndQuadrant.it

Copyright and License

Copyright (C) 2012, 2013 Gianni Ciolli gianni.ciolli@2ndQuadrant.it.

Narugn is distributed under the terms of the GNU General Public License version 3 or later, which is available both in the enclosed COPYING file and at http://www.gnu.org/copyleft/gpl.html .