MOTIVATION

It is very hard to compute graph components on pure PostgreSQL. With this extension you can do this very efficiently. Extension does this on pointers using a minimal amount of RAM. This allows you to build components of hundreds of thousands of vertices based on many millions of pairs in seconds.

API

graph_components(integer[]) RETURNS internal

get_component(internal) RETURNS SETOF integer[]

get_component_id(internal) RETURNS TABLE(id int4, component_id int4)

Typical usage: get_component(graph_components(array_with_connections)) array_with_connections can contain a list of integers (id of vertex).

EXAMPLE

Basic example:

SELECT
  get_component(graph_components(array[1,2,3,4,5]))

Returns:

{1,2,3,4,5}
SELECT
  get_component(graph_components(connection))
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)

Returns:

{12,14}
{1,2,3,4,5,10,11}
{192,10000}

Return components:

SELECT
  get_component(graph_components(c.connections_array))
FROM public.connections c

Return vertex and component:

SELECT
    w.vertex,
    x.component
FROM (
  SELECT
    get_component(graph_components(c.connections_array))
  FROM public.connections c
) as x(component), unnest (component) as w(vertex)
SELECT
    w.vertex,
    x.list
FROM (
SELECT
  get_component(graph_components(connection))
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)
) as x(list), unnest (list) as w(vertex)

Returns:

vertex	component
12	{12,14}
14	{12,14}
1	{1,2,3,4,5,10,11}
2	{1,2,3,4,5,10,11}
3	{1,2,3,4,5,10,11}
4	{1,2,3,4,5,10,11}
5	{1,2,3,4,5,10,11}
10	{1,2,3,4,5,10,11}
11	{1,2,3,4,5,10,11}
192	{192,10000}
10000	{192,10000}

get_component_id() - returns the vertex id along with the lowest vertex id from the component.

SELECT
  (get_component_id(graph_components(connection))).*
FROM (values (array[1,2,3,4,5]),
      (array[4,10]),
      (array[10,11]),
      (array[12,14]),
      (array[192]),
      (array[10000,192])
) k(connection)

Returns:

id	component_id
12	12
1	1
10	1
11	1
192	192
3	1
14	12
4	1
10000	192

Example execution plan with 1.5M connections, 1078322 vertices and 239933 graph components

QUERY PLAN
Nested Loop  (cost=33952.01..34157.02 rows=10000 width=36) (actual time=16875.675..17365.712 rows=1078322 loops=1)
  ->  ProjectSet  (cost=33952.00..33957.02 rows=1000 width=32) (actual time=16875.636..17050.650 rows=239933 loops=1)
        ->  Aggregate  (cost=33952.00..33952.01 rows=1 width=32) (actual time=16875.607..16875.608 rows=1 loops=1)
              ->  Seq Scan on connections_array  (cost=0.00..30066.60 rows=1554160 width=29) (actual time=0.003..113.243 rows=1554160 loops=1)
  ->  Function Scan on unnest w  (cost=0.00..0.10 rows=10 width=4) (actual time=0.001..0.001 rows=4 loops=239933)
Planning Time: 0.428 ms
Execution Time: 17393.171 ms

Installation:

  pgxn install --sudo -- graph_component

Context:

CREATE TABLE IF NOT EXISTS public.connections_array(
    connection integer[]
);