panda_post

This Release

panda_post 0.2.1

Date

2016-08-19

Status

Unstable

Abstract

Python NumPy ndarray data type for Postgres

Released By

decibel

License

FreeBSD

Resources

Special Files

Tags

Extensions

panda_post 0.2.1: Python NumPy ndarray data type for Postgres

README

PandaPost

These extensions allow you to represent Python NumPy/Pandas objects in Postgres.

Warning

This is very much a work in progress. Expect the APIs and what is in each extension to change!

Currently you can think of PandaPost more as an experiment than anything else. It seems like it would be useful to be able to natively store ndarrays, DataFrames, etc in a Postgres column. Then again, maybe not. :)

If you can make use of any of this, please drop a line to our Google Group. We’d love any feedback!

Installation

I haven’t released this on PGXN yet, so for now you need to clone the git repo and make install. After that you can CREATE EXTENSION "PandaPost"; in your database. Currently it installs everything in your default schema; I suggest creating a panda schema and installing there. Note that it also requires plpythonu.

To wrap all that together:

git clone https://github.com/PandaPost/PandaPost.git
# Or, download https://github.com/PandaPost/PandaPost/archive/master.zip and unzip
cd PandaPost
make install
psql <database name>
CREATE EXTENSION plpythonu;
CREATE SCHEMA panda;
CREATE EXTENSION panda_post WITH SCHEMA panda;

Current Status

There are now casts between ndarray and all the plpython supported types (boolean, int*, float, real, numeric and text). This allows things like SELECT panda.str(array[1,1,2,2]::panda.ndarray);.

Table 1. Table General Functions
Function	Description
T	Apply ndarray T() function to input
eval	Returns the results of running python’s eval() on the input, as an ndarray. IE: `panda.repr(panda.eval('range(4)'))`
repr	Returns the python repr() of the input.
str	Returns the python str() of the input.

Table 2. Table Set Functions
Function	Description
ediff1d	Returns difference between elements in array
in1d	Returns boolean array of whether each element in `i` is present in `ar2`. See numpy.in1d.
intersect1d	Return unique list of values that are present in both `i` and `ar2`, similar to `INTERSECT` in SQL. See numpy.intersect1d.
ndunique	Return unique elements in `ar`, plus other potential output. See numpy.unique.
ndunique1	Same as ndunique() but only return the array of unique elements.
setdiff1d	Return the difference between `i` and `ar2`, similar to `EXCEPT` in SQL. See numpy.setdiff1d.
setxor1d	Return the values that are in only one (but not both) `i` and `ar2`. See numpy.setxor1d.
union1d	Return the unique values that are in either `i` or `ar2`. See numpy.union1d.

Table 3. Table Special Functions
Function	Description
create_cast	Creates casts to and from a Postgres data type and ndarray

Next up: porting the more common ndarray functions:

np.ndarray.byteswap      np.ndarray.cumsum        np.ndarray.flat          np.ndarray.min           np.ndarray.ravel         np.ndarray.shape         np.ndarray.tobytes
np.ndarray.all           np.ndarray.choose        np.ndarray.data          np.ndarray.flatten       np.ndarray.nbytes        np.ndarray.real          np.ndarray.size          np.ndarray.tofile
np.ndarray.any           np.ndarray.clip          np.ndarray.diagonal      np.ndarray.getfield      np.ndarray.ndim          np.ndarray.repeat        np.ndarray.sort          np.ndarray.tolist
np.ndarray.argmax        np.ndarray.compress      np.ndarray.dot           np.ndarray.imag          np.ndarray.newbyteorder  np.ndarray.reshape       np.ndarray.squeeze       np.ndarray.tostring
np.ndarray.argmin        np.ndarray.conj          np.ndarray.dtype         np.ndarray.item          np.ndarray.nonzero       np.ndarray.resize        np.ndarray.std           np.ndarray.trace
np.ndarray.argpartition  np.ndarray.conjugate     np.ndarray.dump          np.ndarray.itemset       np.ndarray.partition     np.ndarray.round         np.ndarray.strides       np.ndarray.transpose
np.ndarray.argsort       np.ndarray.copy          np.ndarray.dumps         np.ndarray.itemsize      np.ndarray.prod          np.ndarray.searchsorted  np.ndarray.sum           np.ndarray.var
np.ndarray.astype        np.ndarray.ctypes        np.ndarray.fill          np.ndarray.max           np.ndarray.ptp           np.ndarray.setfield      np.ndarray.swapaxes      np.ndarray.view
np.ndarray.base          np.ndarray.cumprod       np.ndarray.flags         np.ndarray.mean          np.ndarray.put           np.ndarray.setflags      np.ndarray.take

Examples

Basic examples

CREATE EXTENSION IF NOT EXISTS plpythonu;
CREATE EXTENSION panda_post;

CREATE TEMP TABLE s AS SELECT array['a','b','c']::ndarray AS s1, array['c','d']::ndarray AS s2;

-- python repr() of an ndarray of strings
SELECT repr(s1) FROM s;
          repr
-------------------------
 array(['a', 'b', 'c'], +
       dtype='|S1')
(1 row)

-- python str() of same array
SELECT str(s1) FROM s;
      str
---------------
 ['a' 'b' 'c']
(1 row)

-- exclusive-or of two ndarrays
SELECT str(setxor1d(s1, s2)) FROM s;
      str
---------------
 ['a' 'b' 'd']
(1 row)

-- Intersection
SELECT str(intersect1d(s1, s2)) FROM s;
  str
-------
 ['c']
(1 row)

This more complicated example uses the lambad extension to create a dataframe, which can currently be returned as an ndarray.

Note	Eventually there will be an actual DataFrame Postgres data type

Basic ndarray type storing a subclass (in this case, a Pandas DataFrame)

CREATE EXTENSION IF NOT EXISTS lambda;
\set df pd.DataFrame.from_dict([{"a":1,"b":"a"},{"a":2,"b":"b"}])
SELECT repr(
    lambda(
      $l$(
        ndarray
      ) RETURNS ndarray
      LANGUAGE plpythonu
      TRANSFORM FOR TYPE ndarray
      AS $body$
        import pandas as pd

        return $l$ || :'df' || $l$
      $body$
      $l$
      , NULL::ndarray
    )
  );
  repr
---------
    a  b+
 0  1  a+
 1  2  b
(1 row)

Copyright and License

PandaPost is released under a BSD liscense.

PGXN

PostgreSQL Extension Network