Contents
How To Contribute
This project is an open project. Any comment or idea is more than welcome.
Here’s a few tips to get started if you want to get involved
Where to start ?
If you want to help, here’s a few ideas :
1- Testing : You can install the master
branch of the project and realize
extensive tests based on your use case. This is very useful to improve the
stability of the code. Eventually if you can publish you test cases, please
add them in the /tests/sql
directory or in demo
. I have recently
implemented “anonymous dumps” and I need feedback !
2- Documentation : You can write documentation and examples to help new
users. I have created a docs
folder where you can put documentation on
how to install and use the extension…
3- Benchmark : You run tests on various setups and measure the impact of the extension on performances
4- Junior Jobs : I have flagged a few issues as “Junior Jobs” on the project issue board. If you want to give a try, simply fork the git repository and start coding !
5- Spread the Word : If you look this extension, just let other people know ! You can publish a blog post about it or a youtube video or whatever format you feel comfortable with !
In any case, let us know how we can help you moving forward
Forking, mirroring and Rebasing
To contribute code to this project, you can simply create you own fork.
Over time, the main repository (let’s call it upstream
) will evolve and your
own repository (let’s call it origin
) will miss the latest commits. Here’s
a few hints on how to handle this
Connect your repo to the upstream
Add a new remote to your local repo:
git remote add upstream https://gitlab.com/dalibo/postgresql_anonymizer.git
Keep your master branch up to date
At any time, you can mirror your personal repo like this:
# switch to the master branch
git checkout master
# download the latest commit from the main repo
git fetch upstream
# apply the latest commits
git rebase upstream/master
# push the changes to your personal repo
git push origin
Rebase a branch
When working on a Merge Requests (MR
) that takes a long time, it can happen
that your local branch (let’s call it foo
) is out of sync. Here’s how you
can apply the lastest:
# switch to your working branch branch
git checkout foo
# download the latest commit from the main repo
git fetch upstream
# apply the latest commits
git rebase upstream/master
# push the changes to your personal repo
git push origin --force-with-lease
Adding new functions
The set of functions is based on pragmatic experience and feedback. We try to cover the most common personal data types. If you need an additional function, let us know !
If you want to add new functions, please define the following attributes:
- volatility: should be
VOLATILE
(default),STABLE
orIMMUTABLE
- strict mode:
CALLED ON NULL INPUT
(default) orRETURNS NULL ON NULL INPUT
- security level:
SECURITY INVOKER
(default) orSECURITY DEFINER
- parallel mode:
PARALLEL UNSAFE
(default) orPARALLEL SAFE
- search_path:
SET search_path=''
Please read the CREATE FUNCTION documentation for more details.
In most cases, a masking functions should have the following attributes:
CREATE OR REPLACE FUNCTION anon.foo(TEXT)
RETURNS TEXT AS
$$
SELECT ...
$$
LANGUAGE SQL
VOLATILE
RETURNS NULL ON NULL INPUT
PARALLEL UNSAFE
SECURITY INVOKER
SET search_path=''
;
Testing with docker
You can easily set up a proper testing environment from scratch with docker and docker-compose !
First launch a container with :
make docker_init
Then you can enter inside the container :
make docker_bash
Once inside the container, you can do the classic operations :
make
make install
make installcheck
psql
The entire test suite take a few minutes to run. When developping a feature,
usually you only want to check one test in particular. You can limit the scope
of the test run with the REGRESS
variable.
For instance, if you want to run only the noise.sql
test:
make installcheck REGRESS=noise
Linting
Use make lint
to run the various linters on the project.
Git pre-commit hook
We maintain a [pre-commit] configuration to operate some verification at commit time, if you want to use that configuration you should:
- Install pre-commit (On Debian based system you can probably simply run :
sudo apt install pre-commit
) - Then apply the configuration with
pre-commit install
- And finally you can verify the configuration is properly applied by running
it “by hand”:
.git/hooks/pre-commit
Fake Data
By default, the extension is shipped with an english fake dataset.
Update the fake dataset
make fake_data
git commit data
Add a new language
To add a new fake dataset in another language, just change the
FAKE_DATA_LOCALES
variable
mkdir -p data/fr_FR/fake
FAKE_DATA_LOCALES=fr_FR make fake_data
Security
About SQL Injection
By design, this extension is prone to SQL Injections risks. When adding new
features, a special focus should be made on security, especially by sanitizing
the functions parameters and using regclass
and oid
instead of literal
names to designate objects…
See links below for more details:
- https://stackoverflow.com/questions/10705616/table-name-as-a-postgresql-function-parameter
- https://www.postgresql.org/docs/current/datatype-oid.html
- https://xkcd.com/327/
Security level for functions
Most functions should be defined as SECURITY INVOKER
. In very exceptional cases,
it may be necessary to use SECURITY DEFINER
but this should be used with care.
Read the CREATE FUNCTION documentation for more details:
https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
Search_path
This extension will create views based on masking functions. These functions will be run as with privileges of the owners of the views. This is prone to search_path attacks: an untrusted user may be able to override some functions and gain superuser privileges.
Therefore all functions should be defined with SET search_path=''
even if
they are not SECURITY DEFINER
.