floatfile
A Postgres extension to store float arrays in individual files,
providing very fast querying (especially when combined with
aggs_for_arrays
,
aggs_for_vecs
,
or floatvec
)
without paying a huge cost to keep updating them.
The idea is that if you have 1 million FLOAT
s,
it is much faster if you can load them as a single 8 MB chunk
rather than pulling from a million different places on disk.
Sometimes you can just store your data as an array in a regular Postgres table,
but updating those arrays will be slow and cause painful MVCC churn.
Since this extension keeps the data in a separate file, appends are very fast and outside the MVCC system.
This extension offers several functions:
save_floatfile(filename TEXT, vals FLOAT[])
- Saves an array to a new file.
This creates a new file inside your Postgres default tablespace with the values of the array you provide. (Technically it is two files: one for the floats and one for the nulls.) If either filename
or vals
is NULL
then this does nothing. If vals
has some NULL
elements, they will be remembered.
If filename
already exists, this function will fail.
load_floatfile(filename TEXT)
- Returns a float array with the contents of the file.
extend_floatfile(filename TEXT, newvals FLOAT[])
- Adds newvals
to the end of filename
. If filename
doesn’t exist yet, it will be created.
drop_floatfile(filename TEXT)
- Deletes filename
.
In addition there are tablespace versions of these functions so you can put the files somewhere else:
save_floatfile(tablespace TEXT, filename TEXT, vals FLOAT[])
- Saves an array to a new file in tablespace
.
load_floatfile(tablespace TEXT, filename TEXT)
- Loads an array from filename
in tablespace
.
extend_floatfile(tablespace TEXT, filename TEXT, vals FLOAT[])
- Extends an array to filename
in tablespace
.
drop_floatfile(tablespace TEXT, filename TEXT)
- Deletes filename
.
Note in all cases tablespace
should be the name of the tablespace, not its location on disk.
If it is NULL
then the default tablespace is used (normally the data directory).
Finally there are some functions to compute results directly from the floatfile, since a Postgres array can only be 1GB max:
floatfile_to_hist(filename TEXT, buckets_start FLOAT, bucket_with FLOAT, bucket_count INT)
- Returns an array of integers with the counts of the histogram.
floatfile_to_hist(tablespace TEXT, filename TEXT, buckets_start FLOAT, bucket_with FLOAT, bucket_count INT)
- Returns an array of integers with the counts of the histogram.
floatfile_to_hist2d(xs_filename TEXT, ys_filename TEXT, x_buckets_start FLOAT, y_buckets_start FLOAT, x_bucket_with FLOAT, y_bucket_width, x_bucket_count INT, y_bucket_count)
- Returns a 2-d array of integers with the counts of the histogram.
floatfile_to_hist2d(xs_tablespace TEXT, xs_filename TEXT, ys_tablespace TEXT, ys_filename TEXT, x_buckets_start FLOAT, y_buckets_start FLOAT, x_bucket_with FLOAT, y_bucket_width, x_bucket_count INT, y_bucket_count)
- Returns a 2-d array of integers with the counts of the histogram.
All these functions use Postgres advisory locks. load_floatfile
takes a shared lock, and save
, extend
, and drop
take an exclusive one. They use the two-arg versions of the functions, using 0xF107F11E
for the first arg and the djb2 hash of the user-provided filename for the second one. (See the source code comments for my thoughts on birthday collisions.) You can change the value of the first arg by compiling with a different FLOATFILE_LOCK_PREFIX
.
If you really can’t stand that this uses advisory locks at all,
then I could probably add a compile-time option to use POSIX file locking instead,
but then you won’t see those locks in pg_locks
and they won’t be covered by pg’s deadlock detection.
Pros
These functions are extremely fast compared to processing millions of rows,
even using cstore_fdw
.
They are also a mite faster than storing the whole array in a single row of a regular Postgres table,
but unlike that approach they also have very fast UPDATE
s.
This makes them great for timeseries data,
so you get low-latency queries combined with low-cost appends.
Cons
This extension is designed for very fast queries without paying a high price to keep extending the array, but there are some drawbacks:
-
Updates: You can append to the end of an array, but you can’t change elements in the middle of it. If you really need to you can drop the floatfile and make a new one. That will be a little expensive, but random write access is not really the intended use of this extension. Perhaps someday I’ll add a feature to update just specific elements in the array though.
-
Security: Anyone who has
EXECUTE
permission on our functions can open anyfloatfile
in the current database (reading or writing depends on which function). So make sure that’s okay before using this extension! -
Replication: Your
floatfile
s are not replicated! If you rely on replication, I’d make sure that you only usefloatfiles
for derived data, so that you can reconstruct them if necessary. Unfortunately you can’t simply repeat the same commands on the standby as the master, since taking advisory locks requires write access. So you’ll have to sync the files some other way, e.g. rsync. Perhaps logical replication in Postgres 10 will solve this. -
Backups: These files won’t appear in your
pg_dump
output, so if you are using that for backups, you need to do something extra to include these files. -
Durability: Your
floatfiles
can get corrupted if there is a crash. So again, don’t use them except as derived data that you can rebuild from your core operational source. By the way if you have any suggestions to improve the story here, let me know. I’m thinking I could keep a third file that stores just the length of the array, and update it as the last step of each save/extend operation. Then if a future save/extend fails partway through, the bad data will just get ignored. I’d still need to write the length file atomically though, but I think I can do that with arename
. -
Selectivity: If you want to load something, you load all of it. To do further processing you should use some vector masking functions. (I will probably add these to
floatvec
by the way, R or Pandas style. . . .) But really this is no different than regular Postgres arrays. -
Portability: The on-disk format is the same as the in-memory format. That means you can’t move the files from a big-endian to a little-endian system, or between systems with different
sizeof(bool)
. (OTOHsizeof(float8)
won’t change.) This is probably not something you’d care about anyway, but there it is! Of all these cons this is the easiest to fix, but I’m not sure I care enough to do it, and it will cost a little performance.
FAQ
Why not a Foreign Data Wrapper?
Well, maybe I will port it to one. :-)
I haven’t written an FDW before,
and that way seemed longer to finish v1.
An FDW would improve the cons for Security,
but it still wouldn’t support Replication,
and (I’m not sure, but I believe) it still wouldn’t improve Durability.
Also I’d need to compare the performance of this vs an FDW.
If I do switch to an FDW, I’ll probably use Andrew Dunstan’s file_text_array_fdw
as a guide.
Can this extension read or overwrite my /etc/passwd
?
I hope not! I take pains to confine file access to a floatfile
subdirectory
in either your main data directory or the directory of a given tablespace.
You are not allowed to name files with ..
anywhere in the name—sort of a bigger hammer than required, but it makes the string validation very obvious.
You also can’t start a filename with /
.
If you find a security hole though, I’d appreciate your private email.
Does this work on Mac?
Yes.
Does this work on Linux?
Of course.
Does this work on Windows?
I doubt it. (I am available for hire if you need this though. :-)
TODO
-
Some way to ask for the current floatfiles and what tablespaces they live in would be nice, especially so you don’t get stuck unable to drop a tablespace and unsure why.
-
Better durability as described above: separate file with the length of the array. Also fsync.
-
See if there are any performance gains from using compression like a typical column-store database. Maybe even make it an option so people can compare results themselves.
-
Any hooks in
DROP DATABASE
so we can clean up files when it happens? Yes, make it an FDW or use theProcessUtility_hook
.
Author
Paul A. Jungwirth