How To Contribute

This project is an open project. Any comment or idea is more than welcome.

Here's a few tips to get started if you want to get involved

Where to start ?

If you want to help, here's a few ideas :

1- Testing : You can install the master branch of the project and realize extensive tests based on your use case. This is very useful to improve the stability of the code. Eventually if you can publish you test cases, please add them in the /tests/sql directory or in demo. I have recently implemented "anonymous dumps" and I need feedback !

2- Documentation : You can write documentation and examples to help new users. I have created a docs folder where you can put documentation on how to install and use the extension...

3- Benchmark : You run tests on various setups and measure the impact of the extension on performances

4- Junior Jobs : I have flagged a few issues as "Junior Jobs" on the project issue board. If you want to give a try, simply fork the git repository and start coding !

5- Spread the Word : If you look this extension, just let other people know ! You can publish a blog post about it or a youtube video or whatever format you feel comfortable with !

In any case, let us know how we can help you moving forward

Forking, mirroring and Rebasing

To contribute code to this project, you can simply create you own fork.

Over time, the main repository (let's call it upstream) will evolve and your own repository (let's call it origin) will miss the latest commits. Here's a few hints on how to handle this

Connect your repo to the upstream

Add a new remote to your local repo:

bash git remote add upstream https://gitlab.com/dalibo/postgresql_anonymizer.git

Keep your master branch up to date

At any time, you can mirror your personal repo like this:

```bash

switch to the master branch

git checkout master

download the latest commit from the main repo

git fetch upstream

apply the latest commits

git rebase upstream/master

push the changes to your personal repo

git push origin ```

Rebase a branch

When working on a Merge Requests (MR) that takes a long time, it can happen that your local branch (let's call it foo) is out of sync. Here's how you can apply the lastest:

```bash

switch to your working branch branch

git checkout foo

download the latest commit from the main repo

git fetch upstream

apply the latest commits

git rebase upstream/master

push the changes to your personal repo

git push origin --force-with-lease ```

Adding new functions

The set of functions is based on pragmatic experience and feedback. We try to cover the most common personal data types. If you need an additional function, let us know !

If you want to add new functions, please define the following attributes:

  • volatility: should be VOLATILE (default), STABLE or IMMUTABLE
  • strict mode: CALLED ON NULL INPUT(default) or RETURNS NULL ON NULL INPUT
  • security level: SECURITY INVOKER(default) or SECURITY DEFINER
  • parallel mode: PARALLEL UNSAFE (default) or PARALLEL SAFE
  • search_path: SET search_path=''

Please read the CREATE FUNCTION documentation for more details.

In most cases, a masking functions should have the following attributes:

sql CREATE OR REPLACE FUNCTION anon.foo(TEXT) RETURNS TEXT AS $$ SELECT ... $$ LANGUAGE SQL VOLATILE RETURNS NULL ON NULL INPUT PARALLEL UNSAFE SECURITY INVOKER SET search_path='' ;

Testing with docker

You can easily set up a proper testing environment from scratch with docker and docker-compose !

First launch a container with :

bash make docker_init

Then you can enter inside the container :

bash make docker_bash

Once inside the container, you can do the classic operations :

bash make make install make installcheck psql

The entire test suite take a few minutes to run. When developping a feature, usually you only want to check one test in particular. You can limit the scope of the test run with the REGRESS variable.

For instance, if you want to run only the noise.sql test:

bash make installcheck REGRESS=noise

Security

About SQL Injection

By design, this extension is prone to SQL Injections risks. When adding new features, a special focus should be made on security, especially by sanitizing the functions parameters and using regclass and oid instead of literal names to designate objects...

See links below for more details:

  • https://stackoverflow.com/questions/10705616/table-name-as-a-postgresql-function-parameter
  • https://www.postgresql.org/docs/current/datatype-oid.html
  • https://xkcd.com/327/

Security level for functions

Most functions should be defined as SECURITY INVOKER. In very exceptional cases, it may be necessary to use SECURITY DEFINER but this should be used with care.

Read the CREATE FUNCTION documentation for more details:

https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY

Search_path

This extension will create views based on masking functions. These functions will be run as with privileges of the owners of the views. This is prone to search_path attacks: an untrusted user may be able to override some functions and gain superuser privileges.

Therefore all functions should be defined with SET search_path='' even if they are not SECURITY DEFINER.