Japanese

Welcome to the pg_bulkload Project Home Page

pg_bulkload is a high speed data loading utility for PostgreSQL.

Documentation

High speed data loader: pg_bulkload documentation
- pg_bulkload provides high-speed data loading capability to PostgreSQL users.
Optional tool : pg_timestamp documentation
- pg_timestamp ia an optional tool to skip parsing overhead of timestamp string.

Performance Results

Here is a comparison of the performance of PostgreSQL's COPY and pg_bulkload. Performance was measured with basic-tuned PostgreSQL server.

Table definition: Customer table in DBT-2 benchmark, that is an implementation of TPC-C.
Index definition: There are 2 indexes. The first one is a primary key with one ascending integer column. The second one is a non-unique index with one random integer column.

There are the following measurement patterns.

Initial data loading to an empty table for 4GB of data
Appended data loading to a table with 4GB of data for 1GB of new data
Performance efficiencies by maintenance_work_mem and FILTER features

Result 1: Initial data loading

Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY. In PARALLEL mode, performance will be improved on multi-CPU server because reading an input file and writing rows to a table are done with two processes.

COPY also can be more fast by loading into table without indexes and create indexes after it.

Initial Load (4GB)
Item	Duration	Duration comparison
COPY with indexes	500 sec	-
COPY without indexes + CREATE INDEX	333 sec (229 sec + 51 sec+ 53 sec)	66.7 %
pg_bulkload (DIRECT) with indexes	334 sec	66.8 %
pg_bulkload (PARALLEL) with indexes	221 sec	44.2 %

Result 2: Appended data loading

Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY in this case also. COPY without indexes is not faster than COPY with indexes. Because it has to create indexes for total records of the table from initial.

Appended Load (1GB)
Item	Duration	Duration comparison
COPY with indexes	140 sec	-
COPY without indexes + CREATE INDEX	187 sec (62 sec + 60 sec + 65 sec)	133.6 %
pg_bulkload (DIRECT) with indexes	93 sec	66.4 %
pg_bulkload (PARALLEL) with indexes	70 sec	50.0 %

Result 3: Influence from parameters and features

The maintenece_work_mem, PostgreSQL parameter, affects the performance of pg_bulkload. The duration becomes almost 15 % shorter if this parameter is changed from 64 MB to 1 GB.

FILTER feature transforms input data in various operations, but it's not free. The actual measurement shows the loading time is increased to almost 240 % with SQL functions and almost 140 % with C functions.

pg_bulkload features
項目		時間
項目		Initial (4GB)	Appended (1GB)
MWM = 64 MB	DIRECT	397 sec	109 sec
MWM = 1 GB
	DIRECT	334 sec	93 sec
	DIRECT with SQL FILTER	801 sec	216 sec
	DIRECT with C FILTER	456 sec	126 sec

Conditions

Item	Value
Server	Dell PowerEdge R410
CPU	Intel Xeon E5645 (2.4GHz) 12 core * 2
Memory	32GB
Disks	SAS 10000rpm 2TB * 4
OS	CentOS 6.2 (64bit)
PostgreSQL version	9.3.4
pg_bulkload version	3.1.6
shared_buffers	3210MB
checkpoint_segments	300
checkpoint_timeout	15min
work_mem	1MB
maintenance_work_mem	1GB
Table definition	DBT-2 customer table
Indexed columns	c_id (PRIMARY KEY)
Indexed columns	c_d_id (non-unique B-Tree)
Constraints	NOT NULL for all columns
Input file format	CSV

PGXN

PostgreSQL Extension Network

Contents