permuteseq 1.2.0

This Release
permuteseq 1.2.0
Date
Status
Stable
Other Releases
Abstract
Extension to manage scalable pseudo-random permutations of sequences
Released By
dverite
License
The PostgreSQL License
Resources
Special Files
Tags

Extensions

permuteseq 1.2.0
Extension to manage scalable pseudo-random permutations of sequences

Documentation

LICENSE
Copyright and License

README

permuteseq

It's a C PostgreSQL extension to manage scalable pseudo-random permutations of sequences.

Usage

Example in psql:

=# CREATE EXTENSION permuteseq;

=> CREATE SEQUENCE s MINVALUE -10000 MAXVALUE 15000;

=> \set secret_key 123456789012345

=> SELECT permute_nextval('s'::regclass, :secret_key) FROM generate_series(-10000,-7000);

 permute_nextval
-----------------
        -545
       -8279
        4160
       10528
         581
[... skip 2991 unique values within the range [-10000,15000], in a random-looking order ... ]
        8901
        9558
       11359
        9728
         365

=> SELECT reverse_permute('s'::regclass, -545, :secret_key);

 reverse_permute
-----------------
      -10000

=> SELECT range_encrypt_element(91919191919, 1e10::bigint, 1e11::bigint, :secret_key);

 range_encrypt_element
-----------------------
       83028080992

=> SELECT range_decrypt_element(83028080992, 1e10::bigint, 1e11::bigint, :secret_key);

 range_decrypt_element
-----------------------
       91919191919

Functions

permute_nextval(seq_oid oid, crypt_key bigint) RETURNS bigint

Advance sequence and return the new value encrypted within the bounds of the sequence.

reverse_permute(seq_oid oid, value bigint, crypt_key bigint) RETURNS bigint

Compute and return the original clear value from its permuted element in the sequence.

range_encrypt_element(clear_val bigint, min_val bigint, max_val bigint, crypt_key bigint) RETURNS bigint

Encrypt a bigint element in the [min_val,max_val] range with a bigint encryption key.

range_decrypt_element(crypt_val bigint, min_val bigint, max_val bigint, crypt_key bigint) RETURNS bigint

Decrypt a value previously encrypted with range_encrypt_element().

Installation

The Makefile uses the PGXS infrastructure to find include and library files, and determine the install location.
Build and install with:

$ make
$ (sudo) make install

Some explanations in Q & A form

Q: How is that better than ORDER BY random() applied to a generate_series output, or a shuffle algorithm like Fisher-Yates?
A: The permuter in this extension does not need to materialize the output sequence, even temporarily.
It computes any element independantly when needed, in near-constant time, so it's as efficient with 2^64 elements as with a thousand or a million.

Q: How about generating random()-based integers on the fly and look up for collisions? Isn't that simpler?
A: It's simpler but doesn't scale. It can do well only when a small number of elements are actually picked up from a wide sequence. Otherwise the Birthday Problem kicks in and the exponentially-growing collision probability makes this technique unworkable.

Q: What are the use cases for permuted sequences?
A: Obfuscating synthetic keys, making generators for short codes that should not be guessable or enumerable by outsiders: coupon numbers, serial numbers, short URLs.

Q: Why not just use standard 128-bit Globally unique identifiers with the UID type?
A: Sometimes shorter values are necessary. For instance when coupon numbers may be spelled over the phone, AX8GH4T is reasonable but 21EC2020-3AEA-4069-A2DD-08002B30309D is not.

Q: How to generate several different output sequences from the same input range?
A: By using a different 64-bit secret key for each sequence you want to keep secret.

Q: How does the permuter work, exactly?
A: It's essentially a format-preserving encryption scheme. In an inner loop, there's a balanced 9-round Feistel Cipher, with a block size determined by the range of the sequence. The round function hashes the current block with bits from the key. A cycle-walking outer loop iterates over that encryption step until the result fits into the desired range, so the outputs are guaranteed to be in the same exact range as the inputs. The reverse permutation follows the same process iterating in the reverse order.

Q: Is the shuffle effect comparable to crypto-grade randomizing?
A: No. Altough it's based on well-known and proven techniques, the limits at play (64-bit key, reduced output space) are too small for that. Also, it's generally safe to assume that code not reviewed by professional cryptographers is not cryptographically strong. If you simply want a strong 64-bit to 64-bit Feistel cipher, you may consider XTEA, available for PostgreSQL through the cryptint extension.

Q: How is the unicity of the output guaranteed?
A: By the mathematical property that is at the heart of the Feistel Network, which produces a permutation in the mathematical sense (f(x)=f(y) <=> x=y).

Q: Are the output sequences deterministic or truly random?
A: The permutations are fully deterministic. The random-looking effect is due to encryption, not to a PRNG. The same range with the same encryption key will always produce the same output sequence.

Q: What happens when the same input is permuted within a different range and the same key?
A: It might produce a totally different result, due to the cipher block size being dynamic. The block size in bits is computed as the smallest even number of bits that can represent the input range. For example, the range [-1000,1000] contains 2001 values, just below 2^11=2048, but since 11 is uneven, the block is rounded up to 12 bits wide. The fact that the algorithm chooses a block comparable in size to the input range is essential to reduce the number of cycle-walking iterations.