Extensions
- panda_post 0.2.1
- Python NumPy ndarray data type for Postgres
README
PandaPost
These extensions allow you to represent Python NumPy/Pandas objects in Postgres.
Warning
|
This is very much a work in progress. Expect the APIs and what is in each extension to change! |
Currently you can think of PandaPost more as an experiment than anything else. It seems like it would be useful to be able to natively store ndarrays, DataFrames, etc in a Postgres column. Then again, maybe not. :)
If you can make use of any of this, please drop a line to our Google Group. We’d love any feedback!
Installation
I haven’t released this on PGXN yet, so for now you need to clone the git repo and make install
. After that you can CREATE EXTENSION "PandaPost";
in your database. Currently it installs everything in your default schema; I suggest creating a panda schema and installing there. Note that it also requires plpythonu.
To wrap all that together:
git clone https://github.com/PandaPost/PandaPost.git
# Or, download https://github.com/PandaPost/PandaPost/archive/master.zip and unzip
cd PandaPost
make install
psql <database name>
CREATE EXTENSION plpythonu;
CREATE SCHEMA panda;
CREATE EXTENSION panda_post WITH SCHEMA panda;
Current Status
There are now casts between ndarray and all the plpython supported types (boolean, int*, float, real, numeric and text). This allows things like SELECT panda.str(array[1,1,2,2]::panda.ndarray);
.
Function |
Description |
T |
Apply ndarray T() function to input |
eval |
Returns the results of running python’s eval() on the input, as an ndarray. IE: |
repr |
Returns the python repr() of the input. |
str |
Returns the python str() of the input. |
Function |
Description |
ediff1d |
|
in1d |
Returns boolean array of whether each element in |
intersect1d |
Return unique list of values that are present in both |
ndunique |
Return unique elements in |
ndunique1 |
Same as ndunique() but only return the array of unique elements. |
setdiff1d |
Return the difference between |
setxor1d |
Return the values that are in only one (but not both) |
union1d |
Return the unique values that are in either |
Function |
Description |
create_cast |
Creates casts to and from a Postgres data type and ndarray |
Next up: porting the more common ndarray functions:
np.ndarray.byteswap np.ndarray.cumsum np.ndarray.flat np.ndarray.min np.ndarray.ravel np.ndarray.shape np.ndarray.tobytes
np.ndarray.all np.ndarray.choose np.ndarray.data np.ndarray.flatten np.ndarray.nbytes np.ndarray.real np.ndarray.size np.ndarray.tofile
np.ndarray.any np.ndarray.clip np.ndarray.diagonal np.ndarray.getfield np.ndarray.ndim np.ndarray.repeat np.ndarray.sort np.ndarray.tolist
np.ndarray.argmax np.ndarray.compress np.ndarray.dot np.ndarray.imag np.ndarray.newbyteorder np.ndarray.reshape np.ndarray.squeeze np.ndarray.tostring
np.ndarray.argmin np.ndarray.conj np.ndarray.dtype np.ndarray.item np.ndarray.nonzero np.ndarray.resize np.ndarray.std np.ndarray.trace
np.ndarray.argpartition np.ndarray.conjugate np.ndarray.dump np.ndarray.itemset np.ndarray.partition np.ndarray.round np.ndarray.strides np.ndarray.transpose
np.ndarray.argsort np.ndarray.copy np.ndarray.dumps np.ndarray.itemsize np.ndarray.prod np.ndarray.searchsorted np.ndarray.sum np.ndarray.var
np.ndarray.astype np.ndarray.ctypes np.ndarray.fill np.ndarray.max np.ndarray.ptp np.ndarray.setfield np.ndarray.swapaxes np.ndarray.view
np.ndarray.base np.ndarray.cumprod np.ndarray.flags np.ndarray.mean np.ndarray.put np.ndarray.setflags np.ndarray.take
Examples
CREATE EXTENSION IF NOT EXISTS plpythonu;
CREATE EXTENSION panda_post;
CREATE TEMP TABLE s AS SELECT array['a','b','c']::ndarray AS s1, array['c','d']::ndarray AS s2;
-- python repr() of an ndarray of strings
SELECT repr(s1) FROM s;
repr
-------------------------
array(['a', 'b', 'c'], +
dtype='|S1')
(1 row)
-- python str() of same array
SELECT str(s1) FROM s;
str
---------------
['a' 'b' 'c']
(1 row)
-- exclusive-or of two ndarrays
SELECT str(setxor1d(s1, s2)) FROM s;
str
---------------
['a' 'b' 'd']
(1 row)
-- Intersection
SELECT str(intersect1d(s1, s2)) FROM s;
str
-------
['c']
(1 row)
This more complicated example uses the lambad extension to create a dataframe, which can currently be returned as an ndarray.
Note
|
Eventually there will be an actual DataFrame Postgres data type |
CREATE EXTENSION IF NOT EXISTS lambda;
\set df pd.DataFrame.from_dict([{"a":1,"b":"a"},{"a":2,"b":"b"}])
SELECT repr(
lambda(
$l$(
ndarray
) RETURNS ndarray
LANGUAGE plpythonu
TRANSFORM FOR TYPE ndarray
AS $body$
import pandas as pd
return $l$ || :'df' || $l$
$body$
$l$
, NULL::ndarray
)
);
repr
---------
a b+
0 1 a+
1 2 b
(1 row)
Copyright and License
PandaPost is released under a BSD liscense.
Copyright (c) 2016 Jim Nasby <Jim.Nasby@BlueTreble.com>.