graph_component

This Release
graph_component 1.0.1
Date
Status
Stable
Other Releases
Abstract
Compute graph components efficiently
Description
A PostgreSQL extension to compute graph components on PostgreSQL
Released By
ArturFormella
License
MIT
Special Files
Tags

Extensions

graph_component 1.0.0
Compute graph components

README

MOTIVATION

It is very hard to compute graph components on pure PostgreSQL. With this extension you can do this very efficiently. Extension does this on pointers using a minimal amount of RAM. This allows you to build components of hundreds of thousands of vertices based on many millions of pairs in seconds.

API

graph_components(integer[]) RETURNS internal

get_component(internal) RETURNS SETOF integer[]

get_component_id(internal) RETURNS TABLE(id int4, component_id int4)

Typical usage: get_component(graph_components(array_with_connections)) array_with_connections can contain a list of integers (id of vertex).

EXAMPLE

Basic example:

SELECT
  get_component(graph_components(array[1,2,3,4,5]))

Returns:

{1,2,3,4,5}
SELECT
  get_component(graph_components(connection))
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)

Returns:

{12,14}
{1,2,3,4,5,10,11}
{192,10000}

Return components:

SELECT
  get_component(graph_components(c.connections_array))
FROM public.connections c

Return vertex and component:

SELECT
    w.vertex,
    x.component
FROM (
  SELECT
    get_component(graph_components(c.connections_array))
  FROM public.connections c
) as x(component), unnest (component) as w(vertex)
SELECT
    w.vertex,
    x.list
FROM (
SELECT
  get_component(graph_components(connection))
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)
) as x(list), unnest (list) as w(vertex)

Returns:

vertex	component
12	{12,14}
14	{12,14}
1	{1,2,3,4,5,10,11}
2	{1,2,3,4,5,10,11}
3	{1,2,3,4,5,10,11}
4	{1,2,3,4,5,10,11}
5	{1,2,3,4,5,10,11}
10	{1,2,3,4,5,10,11}
11	{1,2,3,4,5,10,11}
192	{192,10000}
10000	{192,10000}

get_component_id() - returns the vertex id along with the lowest vertex id from the component.

SELECT
  (get_component_id(graph_components(connection))).*
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)

Returns:

id	component_id
12	12
1	1
10	1
11	1
192	192
3	1
14	12
4	1
10000	192

Example execution plan with 1.5M connections, 1078322 vertices and 239933 graph components

QUERY PLAN
Nested Loop  (cost=33952.01..34157.02 rows=10000 width=36) (actual time=16875.675..17365.712 rows=1078322 loops=1)
  ->  ProjectSet  (cost=33952.00..33957.02 rows=1000 width=32) (actual time=16875.636..17050.650 rows=239933 loops=1)
        ->  Aggregate  (cost=33952.00..33952.01 rows=1 width=32) (actual time=16875.607..16875.608 rows=1 loops=1)
              ->  Seq Scan on connections_array  (cost=0.00..30066.60 rows=1554160 width=29) (actual time=0.003..113.243 rows=1554160 loops=1)
  ->  Function Scan on unnest w  (cost=0.00..0.10 rows=10 width=4) (actual time=0.001..0.001 rows=4 loops=239933)
Planning Time: 0.428 ms
Execution Time: 17393.171 ms

Installation:

  pgxn install --sudo -- graph_component

Context:

CREATE TABLE IF NOT EXISTS public.connections_array(
    connection integer[]
);