Contents
Changelog
All notable changes to pgclone are documented in this file.
[4.3.0]
Added
- Consistent-snapshot clones — every clone now reads the source under a
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLYtransaction, so all per-table COPY commands within a single clone observe the same point-in-time view of the source database. This eliminates the foreign-key violations and partial-state anomalies that v4.2.x and earlier could produce when cloning a live OLTP source. Enabled by default forpgclone.table(),pgclone.schema(),pgclone.database(),pgclone.table_async(),pgclone.schema_async()(sequential and parallel pool modes). - Cross-connection snapshot sharing for multi-connection paths — schema and database clones, and parallel pool mode, use
pg_export_snapshot()/SET TRANSACTION SNAPSHOTso every libpq connection involved in a single clone (per-table sub-calls, FK retry, views, matviews, functions, triggers, and every pool worker) binds to one shared snapshot. Same correctness model aspg_dump -j. - Snapshot-coordinator background worker (
pgclone_pool_coordinator_main) — a dedicated bgworker spawned bypgclone.schema_async(... '{"parallel": N}')that opens its own source connection, exports the snapshot, publishes the ID into shared memory for pool workers to import, and sits idle in transaction until every importer has bound, then COMMITs. - Opt-out option —
'{"consistent": false}'in any options-JSON disables the wrapping for callers who want lower source-side lock pressure or prefer the v4.2.x behaviour.
Changed
pgclone.table()source connection now runs at REPEATABLE READ READ ONLY for the entire clone (was: no transaction, except for an inner READ ONLY wrap whenWHEREwas set). The old per-table innerBEGIN TRANSACTION READ ONLYis now a no-op when an outer snapshot transaction is already open, preserving SQL-injection containment in both call paths.pgclone.schema()keeps its initial source connection open across all sub-phases as a snapshot keeper. Previously it closed the connection after the table-list read and each sub-phase opened independent connections that could see different points in time.pgclone.database()follows the same keeper pattern at one level higher; the snapshot ID is propagated to every per-schema sub-call and on through to per-table sub-calls.pgclone.schema_async(... '{"parallel": N}')now launchesN + 1background workers (one coordinator plus N pool workers) when in consistent mode. The coordinator job appears inpgclone.jobs_viewlike any other job for full progress visibility.
Tradeoffs documented
- Long-running clones now hold an open transaction on the source for the entire clone duration, which delays VACUUM cleanup of dead tuples and WAL recycling proportional to clone time. For very long clones on busy sources this may cause table/index bloat and WAL accumulation; opt out with
'{"consistent": false}'if that is more important than cross-table consistency. - Hot-standby sources are supported on PostgreSQL ≥ 10 (where
pg_export_snapshot()works on hot standby).
Internal
- New
CloneOptions.consistent(bool) andCloneOptions.snapshot_id[64]fields plus parser support for"consistent": falseand"snapshot_id": "..."in options JSON. - New helpers
pgclone_begin_repeatable_read(),pgclone_commit_source(),pgclone_export_snapshot(),pgclone_begin_with_imported_snapshot(),pgclone_setup_source_txn(),pgclone_setup_source_txn_done()insrc/pgclone.c. Mirror helpersbgw_begin_repeatable_read(),bgw_commit_source(),bgw_export_snapshot(),bgw_begin_with_imported_snapshot()insrc/pgclone_bgw.c. - New
PgclonePoolQueueshared-memory fields:consistent,snapshot_ready,snapshot_failed,snapshot_imported_count,snapshot_expected_workers,launch_complete,snapshot_id[64],coordinator_job_id. - New
PgcloneJob.consistentfield so the single-worker async path knows whether to wrap in REPEATABLE READ READ ONLY.
[4.2.0]
Added
- Pre-flight validator (
pgclone.preflight(source_conninfo, schema_name)) — read-only sanity check that surfaces issues that would otherwise fail mid-clone. Returns a JSON document with three role-based summary arrays (errors/warnings/info) and a per-check object covering: source/target connection, PostgreSQL versions and major-version compatibility, schema existence on both sides, USAGE/SELECT on source and CREATE on target, estimated source size, current target database size, object counts (tables/views/sequences/indexes), name conflicts on the target schema, extensions installed on source but missing on target, owner/grantee roles missing on target, and non-default tablespaces missing on target.readyistrueonly when zero errors are recorded. - Loopback test coverage — new preflight assertions in
test/test_loopback.sh(function registration, JSON shape,readyboolean, fabricated name-conflict surfacing, missing-source-schema error path, STRICT NULL handling, read-only catalog invariant).
Changed
- CI runs
test/test_loopback.shdirectly instead of an inline subset hand-rolled in.github/workflows/ci.yml. Going forward, every assertion added to the loopback script is exercised in CI automatically — closing the gap that hid v4.1.0 schema-diff tests from the matrix.
Internal
- New isolated translation unit
src/pgclone_preflight.c. Likesrc/pgclone_diff.c, this file does not share helpers withsrc/pgclone.c— the feature is fully additive and trivially auditable. - Both source and target connections run inside
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY. The function never issues DDL or DML on either side.
[4.1.0]
Added
- Schema diff (
pgclone.diff(source_conninfo, schema_name)) — read-only DDL drift detection between source and the local target. Returns a JSON document with summary counts plus per-category arrays ofonly_in_source/only_in_target/modifiedfor tables (with per-columntype/not_null/defaultdrift), indexes (excluding those backing constraints), constraints, user-defined triggers, views and materialized views, and sequences. Both source and local connections run insideBEGIN ... READ ONLYtransactions; the function never executes DDL or DML on either side.
Internal
- New isolated translation unit
src/pgclone_diff.c. The diff feature does not share helpers withsrc/pgclone.c, keeping the surface area additive and trivially auditable. - Catalog queries explicitly use
ORDER BY ... COLLATE "C"to guarantee identical sort order on both sides regardless of the local lc_collate setting, so the merge-walk comparison is deterministic.
[4.0.1]
Fixed
- Schema clone dependency ordering (issue #3):
pgclone.schema()now creates objects in dependency-respecting order — sequences → tables (no triggers) → FK retry → views → matviews → functions → triggers — instead of the previous order which cloned functions before tables. SQL-language functions whose body references a table in the same schema no longer fail withrelation "schema.table" does not existatCREATE FUNCTIONtime. - Unqualified relation references in extracted DDL (issue #3): all source libpq connections now
SET search_path = pg_catalogimmediately after connect. This forcespg_get_triggerdef(),pg_get_expr()(column DEFAULTs), andpg_get_indexdef()to emit fully schema-qualified relation names. Previously, source DBs with an application schema on their defaultsearch_path(a common production pattern) produced DDL likeCREATE TRIGGER ... ON city_street ...andDEFAULT nextval('documents_to_resend_id_seq')that failed when replayed on the target loopback connection. Affects sync paths and both bgworker async paths (sequential and pool).
Internal
- New
pgclone_normalize_session()helper insrc/pgclone.cinvoked frompgclone_connect(); equivalent inlineSET search_pathcalls added at the twoPQconnectdb(job->source_conninfo)sites insrc/pgclone_bgw.c. pgclone_schema()now passestriggers=falsethrough to each per-tablepgclone_table()call and runs a single trigger pass at the end after functions are cloned.
[4.0.0] — BREAKING
Changed
- Schema namespace: All pgclone functions now live under the
pgcloneschema, created automatically by the extensionpgclone_table(...)→pgclone.table(...)pgclone_schema(...)→pgclone.schema(...)pgclone_database(...)→pgclone.database(...)pgclone_database_create(...)→pgclone.database_create(...)pgclone_table_async(...)→pgclone.table_async(...)pgclone_schema_async(...)→pgclone.schema_async(...)pgclone_progress(...)→pgclone.progress(...)pgclone_cancel(...)→pgclone.cancel(...)pgclone_resume(...)→pgclone.resume(...)pgclone_jobs()→pgclone.jobs()pgclone_clear_jobs()→pgclone.clear_jobs()pgclone_progress_detail()→pgclone.progress_detail()pgclone_jobs_view→pgclone.jobs_viewpgclone_discover_sensitive(...)→pgclone.discover_sensitive(...)pgclone_mask_in_place(...)→pgclone.mask_in_place(...)pgclone_create_masking_policy(...)→pgclone.create_masking_policy(...)pgclone_drop_masking_policy(...)→pgclone.drop_masking_policy(...)pgclone_clone_roles(...)→pgclone.clone_roles(...)pgclone_verify(...)→pgclone.verify(...)pgclone_masking_report(...)→pgclone.masking_report(...)pgclone_version()→pgclone.version()pgclone_table_ex(...)→pgclone.table_ex(...)pgclone_schema_ex(...)→pgclone.schema_ex(...)pgclone_functions(...)→pgclone.functions(...)
- Extension control file now specifies
schema = pgclone - Upgrade path: This is a breaking change. Users must
DROP EXTENSION pgclone; CREATE EXTENSION pgclone;to upgrade from v3.x. All application queries must be updated to use the newpgclone.prefix.
[3.6.0]
Added
- GDPR/Compliance Masking Report:
pgclone_masking_report(schema)generates an audit report listing all sensitive columns, their masking status, and recommendations- Detects sensitive columns using shared
sensitivity_rules(~40 patterns across 10 categories) - Checks if a masked view (
table_masked) exists for each table mask_status:MASKED (view)orUNMASKEDrecommendation: either “OK - masked via view” or “Apply mask strategy: X”- Returns SET OF (schema_name, table_name, column_name, sensitivity, mask_status, recommendation)
- Useful for GDPR, HIPAA, SOX compliance audits — “prove no PII exists unmasked”
- Detects sensitive columns using shared
Changed
- Refactored sensitivity rules into shared
SensitivityRulestruct andpgclone_match_sensitivity()helper — used by bothpgclone_discover_sensitiveandpgclone_masking_report - Version bumped to 3.6.0
[3.5.0]
Added
- Clone Verification:
pgclone_verify()compares row counts between source and target databases, table by table, returning a side-by-side comparison- Two overloads:
pgclone_verify(conninfo)for all schemas,pgclone_verify(conninfo, schema)for a specific schema - Returns SET OF (schema_name, table_name, source_rows, target_rows, match)
- Match indicators:
✓(equal),✗(different),✗ (missing)(table not on target) - Uses
pg_class.reltuplesfor fast approximate counts without full table scans - Works with regular and partitioned tables
- Two overloads:
- 5 new pgTAP tests (84 total): verify function runs, row counts match after clone, correct column count in result
Changed
- Version bumped to 3.5.0
[3.4.0]
Added
- Clone Roles with Permissions and Passwords:
pgclone_clone_roles()clones database roles from source to local, including encrypted passwords, role attributes, and all privilege grantspgclone_clone_roles(conninfo)— clone all non-system rolespgclone_clone_roles(conninfo, 'role1,role2')— clone specific roles (comma-separated)- Clones role attributes: LOGIN, SUPERUSER, CREATEDB, CREATEROLE, REPLICATION, INHERIT, CONNECTION LIMIT, VALID UNTIL
- Copies encrypted passwords from
pg_authid(requires superuser on both source and target) - If target role already exists: updates password, attributes, and applies permissions additively
- Clones role memberships (GRANT role TO role)
- Clones schema-level privileges (USAGE, CREATE) via
aclexplode() - Clones table-level privileges (SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER)
- Clones sequence privileges (USAGE, SELECT, UPDATE)
- Clones function/procedure EXECUTE privileges
test_reader,test_writer,test_admintest fixture roles with graduated permissions- 6 new pgTAP tests (79 total): role creation, LOGIN attribute, CREATEDB attribute verification
Changed
- Version bumped to 3.4.0
[3.3.0]
Added
- Dynamic Data Masking via Views: Role-based masking policies that preserve original data while presenting masked views to unprivileged users
pgclone_create_masking_policy(schema, table, mask_json, privileged_role): creates a masked view (table_masked), revokes base table SELECT from PUBLIC, grants view access to PUBLIC, grants direct table access to the privileged rolepgclone_drop_masking_policy(schema, table): drops the masked view and restores base table access to PUBLIC- Uses the same
pgclone_build_mask_expr()engine as clone-time and static masking — all 8 strategies work - Queries local
pg_attributefor column names, applies mask expressions to matching columns
- 7 new pgTAP tests (73 total): policy creation, view existence, masked data verification, privileged role bypass, policy drop, view cleanup
Changed
- Version bumped to 3.3.0
[3.2.0]
Added
- Static Data Masking on Local Tables:
pgclone_mask_in_place(schema, table, mask_json)applies masking to already-cloned tables via UPDATE statements — no source connection needed- Uses the same mask JSON format as clone-time masking:
{"email": "email", "name": "name", "ssn": "null"} - Reuses
pgclone_build_mask_expr()for consistent masking between clone-time and post-clone paths - Wraps user-provided JSON into clone options format for unified parsing
- Returns summary:
OK: masked N rows in schema.table (M columns)
- Uses the same mask JSON format as clone-time masking:
- 7 new pgTAP tests (66 total): clone-then-mask workflow, verify original data removed, names/SSNs masked, row count preserved
Changed
- Version bumped to 3.2.0
[3.1.0]
Added
- Auto-Discovery of Sensitive Data:
pgclone_discover_sensitive(conninfo, schema_name)scans the source catalog for columns matching ~40 sensitive data patterns and returns suggested mask rules as JSON- Pattern categories: email, name, phone, SSN/national ID, financial (salary/income), secrets/credentials (password/token/api_key), address, date of birth, credit card, IP address
- Case-insensitive matching with LIKE-style wildcards against column names
- Output is a JSON object grouped by table, ready to paste into the
"mask"option:{"employees": {"email": "email", "salary": "random_int", "ssn": "null"}} - Scans regular tables and partitioned tables (relkind
randp)
- 6 new pgTAP tests (59 total): discovery function runs, detects email/full_name/phone/salary/ssn columns
Changed
- Version bumped to 3.1.0
[3.0.0]
Added
- Data Masking / Anonymization: Column-level data masking applied during cloning via the
"mask"JSON option- 8 masking strategies:
email,name,phone,partial,hash,null,random_int,constant email: preserves domain, masks local part (alice@example.com→a***@example.com)name: replaces withXXXXphone: keeps last 4 digits (+1-555-123-4567→***-4567)partial: configurable prefix/suffix retention (Johnson→Jo***onwith prefix=2, suffix=2)hash: deterministic MD5 hash — preserves referential integrity across tablesnull: replaces with NULLrandom_int: random integer in configurable[min, max]rangeconstant: fixed replacement value- All strategies are NULL-safe (NULL inputs produce NULL outputs, except
constantandrandom_int)
- 8 masking strategies:
- Masking is applied server-side as SQL expressions inside
COPY (SELECT ...) TO STDOUT— no row-by-row processing overhead - Fully composable with existing
columns,where,indexes,constraints,triggersoptions MaskRulestruct andpgclone_build_mask_expr()internal functions for extensible mask strategy architecture- 15 new pgTAP tests (53 total): email masking, name masking, null masking, hash masking, constant masking, combined masks with WHERE filter
test_schema.employeestest fixture table with realistic sensitive data (names, emails, phones, salaries, SSNs)
Changed
CloneOptionsstruct extended withmasks[]array andnum_maskscounterpgclone_parse_options()handles"mask"JSON key with both simple string values ("email") and object values ({"type":"partial", "prefix":2, "suffix":3})pgclone_copy_data()queries source catalog for column names when masks are active (without explicit column list) to apply per-column mask expressions- Version bumped to 3.0.0
[2.2.1]
Added
- WHERE clause SQL injection protection: Two-layer defense against malicious input in the
"where"JSON option- Layer 1: Keyword validation rejects DDL/DML keywords (
DROP,INSERT,UPDATE,DELETE,CREATE,ALTER,TRUNCATE, etc.) and semicolons before the query is sent - Layer 2: Source connection runs inside
BEGIN TRANSACTION READ ONLYwhen a WHERE clause is present — PostgreSQL itself blocks any write operations even if validation is bypassed - Word-boundary-aware matching avoids false positives on column names like
created_at,update_count,drop_rate
- Layer 1: Keyword validation rejects DDL/DML keywords (
- 4 new pgTAP tests (37 total): semicolon rejection, DROP keyword rejection, INSERT keyword rejection, false-positive safety with
created_atcolumn
[2.2.0]
Changed
- Worker Pool Architecture: Parallel cloning (
"parallel": N) now uses a fixed-size worker pool instead of spawning one background worker per table- Exactly N workers are launched, each pulling tasks from a shared queue
- Dynamic load balancing: faster workers automatically handle more tables
- Resource usage reduced from O(tables) to O(N) for bgworkers and DB connections
- No longer risk exhausting
max_worker_processeson large schemas
Added
PgclonePoolQueuestruct in shared memory for task queue managementpgclone_pool_worker_main()background worker entry point- Pool worker test in
test/test_async.sh(TEST 8) PGCLONE_MAX_POOL_TASKSlimit (512 tables per pool operation)- Guard against concurrent pool operations
Removed
- Per-table background worker launch in parallel mode (replaced by pool)
- Per-table job slot allocation in parallel mode (pool workers share fewer slots)
[2.1.4]
Changed
- Local loopback connections now use Unix domain sockets (from
unix_socket_directoriesGUC) instead of TCP127.0.0.1 pg_hba.conftrustentry for127.0.0.1is no longer required for async operations — defaultlocal all all peeris sufficient- Falls back to TCP
127.0.0.1automatically if Unix sockets are unavailable
Fixed
- Security: removed unnecessary
trustauthentication requirement for background worker connections
[2.1.3]
Fixed
- Async bgworker: COPY pipeline error handling — failures now logged with
PQerrorMessage, source COPY result consumed on error path to prevent connection leak - Async bgworker:
WaitForBackgroundWorkerStartupadded to all async functions — jobs no longer stuck in ‘pending’ state pgclone_schema_asyncparallel mode: fixed hardcodedjobs[0]write that corrupted slot 0 in shared memorypgclone_schema_asyncparallel mode: parent job now correctly transitions to COMPLETED after child workers finish
Added
pgclone_clear_jobs()function to free completed/cancelled job slots from shared memory- Async test suite (
test/test_async.sh) coveringpgclone_table_async,pgclone_schema_async,pgclone_progress,pgclone_jobs_view,pgclone_clear_jobs CONTRIBUTING.md— development setup, code guidelines, PR processSECURITY.md— vulnerability reporting, security considerations- Documentation restructured:
docs/USAGE.md,docs/ASYNC.md,docs/TESTING.md,docs/ARCHITECTURE.md,CHANGELOG.md
[2.1.2]
Added
- Elapsed time column in
pgclone_jobs_view elapsed_timefield in progress detail output
[2.1.1]
Changed
- Visual progress bar in
pgclone_jobs_viewreplaces verbose NOTICE messages - Per-table/per-row NOTICE messages moved to DEBUG1 level
[2.1.0]
Added
pgclone_jobs_view— query async job progress as a standard PostgreSQL viewpgclone_progress_detail()— table-returning function for detailed progress
[2.0.1]
Added
pgclone_database_create()— create a new database and clone into it- Automatic pgclone extension installation in the target database
- Idempotent behavior: clones into existing database if it already exists
[2.0.0]
Added
- True multi-worker parallel cloning with
"parallel": Noption - Each table gets its own background worker
- Parent worker monitors child workers via shared memory
Changed
- Shared memory layout expanded for parallel job tracking
[1.2.0]
Added
- Materialized view cloning during schema clone (with
"matviews": falseopt-out) - Exclusion constraint support (cloned alongside PK, UNIQUE, CHECK, FK)
- Materialized view indexes and data are preserved
[1.1.0]
Added
- Selective column cloning with
"columns": [...]JSON option - Data filtering with
"where": "..."JSON option - Automatic filtering of constraints/indexes referencing excluded columns
[1.0.0]
Added
- Async clone operations via background workers (
pgclone_table_async,pgclone_schema_async) - Job progress tracking (
pgclone_progress,pgclone_jobs) - Job cancellation (
pgclone_cancel) - Job resume from checkpoint (
pgclone_resume) - Job cleanup (
pgclone_clear_jobs) - Conflict resolution strategies: error, skip, replace, rename
[0.3.0]
Added
- Background worker infrastructure
- Shared memory allocation for job state
_PG_initwith shmem hooks
[0.2.0]
Added
- Index cloning (including expression and partial indexes)
- Constraint cloning (PRIMARY KEY, UNIQUE, CHECK, FOREIGN KEY)
- Trigger cloning with trigger functions
- JSON options format for controlling indexes/constraints/triggers
- Boolean parameter variants (
pgclone_table_ex,pgclone_schema_ex)
[0.1.0]
Added
- Initial release
pgclone_table()— clone a single table with or without datapgclone_schema()— clone an entire schemapgclone_functions()— clone functions onlypgclone_database()— clone all user schemas- COPY protocol for fast data transfer
pgclone_version()— version string- Support for PostgreSQL 14–18
- pgTAP test suite (33 tests)
- Docker Compose multi-version test infrastructure
- GitHub Actions CI pipeline