PostgreSQL_Anonymizer 0.1.1

This Release

Date

2018-09-18

Status

Stable

Latest Stable

PostgreSQL_Anonymizer 1.3.2 — 2024-04-02

Other Releases

Abstract

Data Anonymization for Postgres

Description

This extension is a suite of database functions that replace sensitive data with plausible values.

Released By

daamien

License

PostgreSQL

Resources

Special Files

Tags

Data Anonymization, Masking

Extensions

anon 0.1.1: Data Anonymization for Postgres

Documentation

LICENSE: The PostgreSQL License
CHANGELOG: CHANGELOG
CONTRIBUTING: How To Contribute
README: Data Sources
NOTES: Development Notes

README

Data Anonymizer Extension for PostgreSQL

postgresql_anonymizer is a set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.

This is projet is at an early stage of development and should used carefully.

Example

```sql =# CREATE EXTENSION IF NOT EXISTS anon CASCADE;

=# UPDATE customer -# SET -# full_name=anon.random_first_name() || ' ' || anon.random_last_name(), -# birth=anon.random_date_between('01/01/1920'::DATE,now()), -# employer=anon.random_company(), -# zipcode=anon.random_zip() -# ;

Requirements

This extension is officially supported on PostgreSQL 9.6 and later versions. It should also work on PostgreSQL 9.5 with a bit of hacking.

It requires an extension named tsm_system_rows, which is delivered by the postgresql-contrib package of the main linux distributions.

Install

With PGXN :

console apt install pgxnclient (or pip install pgxn) pgxn install postgresql_anonymizer

From source :

console make make install

How To Use

Load the extension in your database like this:

sql CREATE EXTENSION IF NOT EXISTS anon CASCADE; SELECT anon.load();

The load() function will charge a default dataset of random data ( lists names, cities, etc. ). If you want to user your own dataset, you can load custom CSV files like this :

sql SELECT anon.load('/path/to/custom_cvs_files/');

You now have access to the following functions :

Generic data

anon.random_date() returns a date
anon.random_date_between(d1,d2) returns a date between d1 and d2
anon.random_int_between(i1,i2) returns an integer between i1 and i2
anon.random_string(n) returns a TEXT value containing n letters

Personal data

anon.random_first_name() returns a generic first name
anon.random_last_name() returns a generic last name
anon.random_email() returns a valid email address
anon.random_zip() returns a 5-digit code
anon.random_city() returns an existing city
anon.random_city_in_country(c) returns a city in country c
anon.random_region() returns an existing region
anon.random_region_in_country(c) returns a region in country c
anon.random_country() returns a country
anon.random_phone(p) return a 8-digit phone with p as a prefix

Company data

anon.random_company() returns a generic company name
anon.random_iban() : returns a valid IBAN
anon.random_siret() : returns a valid SIRET
anon.random_siren() : returns a valid SIREN

Upgrade

Currently there's no way to upgrade easily from a version to another. The operation ALTER EXTENSION ... UPDATE ... is not supported.

You need to drop and recreate the extension after every upgrade.

Performance

So far, we've done very few performance tests. Depending on the size of your data set and number of columns your need to anonymize, you might end up with a very slow process.

Here's some ideas :

Sampling

If your need to anonymize data for testing purpose, chances are that a smaller subset of your database will be enough. In that case, you can easily speed up the anonymization by downsizing the volume of data. There are mulitple way to extract a sample of database :

Dynamic Masking

If you need to limit sensitive data exposure to non-privileged users, you can build Dynamic Masking Views that will automatically replace personal data with anonymized values.

For instance:

SQL CREATE VIEW masked_customer AS SELECT id, anon.random_last_name() AS name, anon.random_date_between('01/01/1920'::DATE,now()) AS birth, fk_last_order, store_id FROM customer;

In certain use cases, [Materialized Views] can be usefull here.

Materialized Views

PGXN

PostgreSQL Extension Network

PostgreSQL_Anonymizer 0.1.1

Extensions

Documentation

README

Contents

Data Anonymizer Extension for PostgreSQL

Example

Requirements

Install

With PGXN :

From source :

How To Use

Generic data

Personal data

Company data

Upgrade

Performance

Sampling

Dynamic Masking

Feedback