PostgreSQL_Anonymizer 0.0.3

This Release

Date

2018-08-31

Status

Stable

Latest Stable

PostgreSQL_Anonymizer 1.3.2 — 2024-04-02

Other Releases

Abstract

Data Anonymization for Postgres

Description

This extension is a suite of database functions that replace sensitive data with plausible values.

Released By

daamien

License

PostgreSQL

Resources

Special Files

Tags

Data Anonymization, Masking

Extensions

anon 0.0.3: Data Anonymization for Postgres

Documentation

LICENSE: The PostgreSQL License
README: Develoopment Notes
CHANGELOG: CHANGELOG
README: Data Sources

README

Data Anonymizer Extension for PostgreSQL

postgresql_anonymizer is a set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.

This is projet is at an early stage of development and should used carefully.

Example

```sql =# CREATE EXTENSION IF NOT EXISTS tsm_system_rows; =# CREATE EXTENSION IF NOT EXISTS anon;

=# UPDATE customer -# SET -# full_name=anon.random_first_name() || ' ' || anon.random_last_name(), -# birth=anon.random_date_between('01/01/1920'::DATE,now()), -# employer=anon.random_company(), -# zipcode=anon.random_zip() -# ;

Requirements

This extension will work with PostgreSQL 9.5 and later versions.

It requires an extension named tsm_system_rows, which is delivered by the postgresql-contrib package of the main linux distributions.

Install

With PGXN :

console apt install pgxnclient (or pip install pgxn) pgxn install postgresql_anonymizer

From source :

console make make install

How To Use

Load the extension in your database like this:

sql CREATE EXTENSION IF NOT EXISTS tsm_system_rows; CREATE EXTENSION IF NOT EXISTS anon;

You now have access to the following functions :

Generic data

anon.random_date() returns a date
anon.random_date_between(d1,d2) returns a date between d1 and d2
anon.random_int_between(i1,i2) returns an integer between i1 and i2
anon.random_string(n) returns a TEXT value containing n letters

Personal data

anon.random_first_name() returns a generic first name
anon.random_last_name() returns a generic last name
anon.random_email() returns a valid email address
anon.random_zip() returns a 5-digit code
anon.random_city() returns an existing city
anon.random_city_in_country(c) returns a city in country c
anon.random_region() returns an existing region
anon.random_region_in_country(c) returns a region in country c
anon.random_country() returns a country
anon.random_phone(p) return a 8-digit phone with p as a prefix

Company data

anon.random_company() returns a generic company name
anon.random_iban() : returns a valid IBAN
anon.random_siret() : returns a valid SIRET
anon.random_siren() : returns a valid SIREN

Performance

So far, we've done very few performance tests. Depending on the size of your data set and number of columns your need to anonymize, you might end up with a very slow process.

Here's some ideas :

Sampling

If your need to anonymize data for testing purpose, chances are that a smaller subset of your database will be enough. In that case, you can easily speed up the anonymization by downsizing the volume of data. There are mulitple way to extract a sample of database :

Dynamic Masking

If you need to limit sensitive data exposure to non-privileged users, you can build Dynamic Masking Views that will automatically replace personal data with anonymized values.

For instance:

SQL CREATE VIEW masked_customer AS SELECT id, anon.random_last_name() AS name, anon.random_date_between('01/01/1920'::DATE,now()) AS birth, fk_last_order, store_id FROM customer;

In certain use cases, Materialized View can be usefull here.

PGXN

PostgreSQL Extension Network

PostgreSQL_Anonymizer 0.0.3

Extensions

Documentation

README

Contents

Data Anonymizer Extension for PostgreSQL

Example

Requirements

Install

With PGXN :

From source :

How To Use

Generic data

Personal data

Company data

Performance

Sampling

Dynamic Masking

Feedback